DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
@ 2018-09-04 13:11 Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 01/16] mem: add length to memseg list Anatoly Burakov
                   ` (37 more replies)
  0 siblings, 38 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (16):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  test: add unit tests for external memory support

 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   9 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   6 +-
 lib/librte_eal/common/include/rte_malloc.h    | 181 +++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_heap.c           | 287 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 383 ++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal.c             |   3 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
 lib/librte_eal/rte_eal_version.map            |   7 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  31 +-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 384 ++++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 34 files changed, 1346 insertions(+), 84 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 01/16] mem: add length to memseg list
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 02/16] mem: allow memseg lists to be marked as external Anatoly Burakov
                   ` (36 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	thomas

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..0868bf681 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..d040a2f71 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -828,7 +828,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1314,6 +1314,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..c522538bf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -857,6 +857,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1365,6 +1366,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1611,7 +1613,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 02/16] mem: allow memseg lists to be marked as external
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 01/16] mem: add length to memseg list Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 03/16] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                   ` (35 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Matan Azrad, Shahaf Shuler,
	Yongseok Koh, Maxime Coquelin, Tiwei Bie, Zhihong Wang,
	Bruce Richardson, Olivier Matz, Andrew Rybchenko,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense. Mempools is a
special case, because we may be asked to allocate a mempool on
a specific socket, and we need to ignore all page sizes on
other heaps or other sockets. Previously, this assumption of
knowing all page sizes was not a problem, but it will be now,
so we have to match socket ID with page size when calculating
minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 +++--
 lib/librte_eal/common/eal_common_memory.c     |  4 +++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 ++++++
 lib/librte_eal/common/malloc_heap.c           |  9 ++++--
 lib/librte_eal/linuxapp/eal/eal.c             |  3 ++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 ++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 +++++++---
 lib/librte_mempool/rte_mempool.c              | 31 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 17 files changed, 102 insertions(+), 20 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e2..d9ed15880 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..885c59c8a 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0868bf681..55a11bf4d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
@@ -547,6 +550,7 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
 	return ret;
 }
 
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..76faf9a4a 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	bool external; /**< true if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..b381d1cb6 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 12aaf2d72..8c37b9d7c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -756,8 +759,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..729ae2060 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d040a2f71..8b0bbe43f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1250,6 +1250,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1298,6 +1301,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1328,6 +1334,9 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
 	int msl_idx;
 	int *data;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..4eae7bec6 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && !msl->external;
+
+	if (!valid)
+		return 0;
+
+	if (msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +485,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 03/16] malloc: index heaps using heap ID rather than NUMA node
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 01/16] mem: add length to memseg list Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 02/16] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 04/16] mem: do not check for invalid socket ID Anatoly Burakov
                   ` (34 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 85 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 ++++++---
 7 files changed, 94 insertions(+), 42 deletions(-)

diff --git a/config/common_base b/config/common_base
index 4bcbaf923..e96c52054 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index a8e479774..1f330c24e 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -21,6 +21,7 @@
 /****** library defines ********/
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 76faf9a4a..5c6bd4bc3 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8c37b9d7c..0a868f61d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -563,12 +580,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -590,8 +609,13 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -607,7 +631,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -622,22 +646,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -645,11 +672,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -667,7 +694,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -682,11 +709,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -696,8 +725,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -919,7 +948,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -957,7 +986,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..dfcdf380a 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 04/16] mem: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 03/16] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 05/16] flow_classify: " Anatoly Burakov
                   ` (33 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 05/16] flow_classify: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 04/16] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 06/16] pipeline: " Anatoly Burakov
                   ` (32 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 06/16] pipeline: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 05/16] flow_classify: " Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 07/16] sched: " Anatoly Burakov
                   ` (31 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 07/16] sched: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (5 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 06/16] pipeline: " Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 08/16] malloc: add name to malloc heaps Anatoly Burakov
                   ` (30 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 08/16] malloc: add name to malloc heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (6 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 07/16] sched: " Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 09/16] malloc: add function to query socket ID of named heap Anatoly Burakov
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 15 ++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 0a868f61d..813961f0c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1011,6 +1010,20 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	/* assign names to default DPDK heaps */
+	for (i = 0; i < rte_socket_count(); i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+		char heap_name[RTE_HEAP_NAME_MAX_LEN];
+		int socket_id = rte_socket_id_by_idx(i);
+
+		snprintf(heap_name, sizeof(heap_name) - 1,
+				"socket_%i", socket_id);
+		strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+		heap->socket_id = socket_id;
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 458c44ba6..0515d47f3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 09/16] malloc: add function to query socket ID of named heap
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (7 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 08/16] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 10/16] malloc: allow creating malloc heaps Anatoly Burakov
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0515d47f3..b789333b3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..6fd729b8b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 10/16] malloc: allow creating malloc heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (8 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 09/16] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 11/16] malloc: allow destroying heaps Anatoly Burakov
                   ` (27 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 19 ++++++++
 lib/librte_eal/common/malloc_heap.c        | 30 +++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 52 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 105 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..182afab1c 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 813961f0c..2742f7b11 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1006,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b789333b3..ade5af406 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 6fd729b8b..c93dcf1a3 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 11/16] malloc: allow destroying heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (9 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 10/16] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 12/16] malloc: allow adding memory to named heaps Anatoly Burakov
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 182afab1c..8a8cc1e6d 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2742f7b11..471094cd1 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1036,6 +1036,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index ade5af406..d135f9730 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index c93dcf1a3..1d3ca0716 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -312,6 +312,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 12/16] malloc: allow adding memory to named heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (10 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 11/16] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 13/16] malloc: allow removing memory from " Anatoly Burakov
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8a8cc1e6d..47f867a05 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 471094cd1..af2476504 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1010,6 +1010,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, n_pages * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = true;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index d135f9730..329524ac9 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -303,6 +303,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d3ca0716..0d052d20a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -314,6 +314,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 13/16] malloc: allow removing memory from named heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (11 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 12/16] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 14/16] malloc: allow attaching to external memory chunks Anatoly Burakov
                   ` (24 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 47f867a05..9bbe8e3af 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index af2476504..7d1d4a290 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1010,6 +1010,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1084,6 +1110,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 329524ac9..5093c4a46 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -354,6 +354,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0d052d20a..f10c34130 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 14/16] malloc: allow attaching to external memory chunks
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (12 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 13/16] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 15/16] malloc: allow detaching from external memory Anatoly Burakov
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 32 +++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 116 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 9bbe8e3af..37af0e481 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,12 +333,38 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
  * @note Heaps created via this call will automatically get assigned a unique
  *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
  *
+ * @note This function has to only be called in one process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
@@ -357,6 +387,8 @@ rte_malloc_heap_create(const char *heap_name);
  * @note This function will return a failure result if not all memory segments
  *   were removed from the heap prior to its destruction
  *
+ * @note This function has to only be called in one process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5093c4a46..2ed173466 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f10c34130..822c5693a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 15/16] malloc: allow detaching from external memory
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (13 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 14/16] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 16/16] test: add unit tests for external memory support Anatoly Burakov
                   ` (22 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 ++++++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 27 ++++++++++++++++++----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 37af0e481..0794f58e5 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 2ed173466..08571e5a0 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -397,10 +397,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 822c5693a..73fecb912 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -316,6 +316,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH 16/16] test: add unit tests for external memory support
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (14 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 15/16] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-04 13:11 ` Anatoly Burakov
  2018-09-13  7:44 ` [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Shahaf Shuler
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-04 13:11 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 384 ++++++++++++++++++++++++++++++++++
 4 files changed, 396 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..5edb5c348
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,384 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+
+		ms = rte_mem_virt2memseg(RTE_PTR_ADD(addr, pgsz * i), NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (15 preceding siblings ...)
  2018-09-04 13:11 ` [dpdk-dev] [PATCH 16/16] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-13  7:44 ` Shahaf Shuler
  2018-09-17 10:07   ` Burakov, Anatoly
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                   ` (20 subsequent siblings)
  37 siblings, 1 reply; 225+ messages in thread
From: Shahaf Shuler @ 2018-09-13  7:44 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, Thomas Monjalon

Hi Anatoly,

First thanks for the patchset, it is a great enhancement. 

See question below. 

Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov:
> Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
> DPDK
> 
> This is a proposal to enable using externally allocated memory in DPDK.
> 
> In a nutshell, here is what is being done here:
> 
> - Index internal malloc heaps by NUMA node index, rather than NUMA
>   node itself (external heaps will have ID's in order of creation)
> - Add identifier string to malloc heap, to uniquely identify it
>   - Each new heap will receive a unique socket ID that will be used by
>     allocator to decide from which heap (internal or external) to
>     allocate requested amount of memory
> - Allow creating named heaps and add/remove memory to/from those
> heaps
> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>   of externally allocated memory
>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> - Allow malloc and memzones to allocate from external heaps
> - Allow other data structures to allocate from externall heaps
> 
> The responsibility to ensure memory is accessible before using it is on the
> shoulders of the user - there is no checking done with regards to validity of
> the memory (nor could there be...).

That makes sense. However who should be in-charge of mapping this memory for dma access?
The user or internally be the PMD when encounter the first packet or while traversing the existing mempools? 

> 
> The general approach is to create heap and add memory into it. For any other
> process wishing to use the same memory, said memory must first be
> attached (otherwise some things will not work).
> 
> A design decision was made to make multiprocess synchronization a manual
> process. Due to underlying issues with attaching to fbarrays in secondary
> processes, this design was deemed to be better because we don't want to
> fail to create external heap in the primary because something in the
> secondary has failed when in fact we may not eve have wanted this memory
> to be accessible in the secondary in the first place.
> 
> Using external memory in multiprocess is *hard*, because not only memory
> space needs to be preallocated, but it also needs to be attached in each
> process to allow other processes to access the page table. The attach API call
> may or may not succeed, depending on memory layout, for reasons similar to
> other multiprocess failures. This is treated as a "known issue" for this release.
> 
> RFC -> v1 changes:
> - Removed the "named heaps" API, allocate using fake socket ID instead
> - Added multiprocess support
> - Everything is now thread-safe
> - Numerous bugfixes and API improvements
> 
> Anatoly Burakov (16):
>   mem: add length to memseg list
>   mem: allow memseg lists to be marked as external
>   malloc: index heaps using heap ID rather than NUMA node
>   mem: do not check for invalid socket ID
>   flow_classify: do not check for invalid socket ID
>   pipeline: do not check for invalid socket ID
>   sched: do not check for invalid socket ID
>   malloc: add name to malloc heaps
>   malloc: add function to query socket ID of named heap
>   malloc: allow creating malloc heaps
>   malloc: allow destroying heaps
>   malloc: allow adding memory to named heaps
>   malloc: allow removing memory from named heaps
>   malloc: allow attaching to external memory chunks
>   malloc: allow detaching from external memory
>   test: add unit tests for external memory support
> 
>  config/common_base                            |   1 +
>  config/rte_config.h                           |   1 +
>  drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
>  drivers/bus/pci/linux/pci.c                   |   2 +-
>  drivers/net/mlx4/mlx4_mr.c                    |   3 +
>  drivers/net/mlx5/mlx5.c                       |   5 +-
>  drivers/net/mlx5/mlx5_mr.c                    |   3 +
>  drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
>  lib/librte_eal/bsdapp/eal/eal.c               |   3 +
>  lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
>  lib/librte_eal/common/eal_common_memory.c     |   9 +-
>  lib/librte_eal/common/eal_common_memzone.c    |   8 +-
>  .../common/include/rte_eal_memconfig.h        |   6 +-
>  lib/librte_eal/common/include/rte_malloc.h    | 181 +++++++++
>  .../common/include/rte_malloc_heap.h          |   3 +
>  lib/librte_eal/common/include/rte_memory.h    |   9 +
>  lib/librte_eal/common/malloc_heap.c           | 287 +++++++++++--
>  lib/librte_eal/common/malloc_heap.h           |  17 +
>  lib/librte_eal/common/rte_malloc.c            | 383 ++++++++++++++++-
>  lib/librte_eal/linuxapp/eal/eal.c             |   3 +
>  lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
>  lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
>  lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
>  lib/librte_eal/rte_eal_version.map            |   7 +
>  lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
>  lib/librte_mempool/rte_mempool.c              |  31 +-
>  lib/librte_pipeline/rte_pipeline.c            |   3 +-
>  lib/librte_sched/rte_sched.c                  |   2 +-
>  test/test/Makefile                            |   1 +
>  test/test/autotest_data.py                    |  14 +-
>  test/test/meson.build                         |   1 +
>  test/test/test_external_mem.c                 | 384 ++++++++++++++++++
>  test/test/test_malloc.c                       |   3 +
>  test/test/test_memzone.c                      |   3 +
>  34 files changed, 1346 insertions(+), 84 deletions(-)  create mode 100644
> test/test/test_external_mem.c
> 
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-13  7:44 ` [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Shahaf Shuler
@ 2018-09-17 10:07   ` Burakov, Anatoly
  2018-09-17 12:16     ` Shahaf Shuler
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 10:07 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, Thomas Monjalon

On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
> Hi Anatoly,
> 
> First thanks for the patchset, it is a great enhancement.
> 
> See question below.
> 
> Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov:
>> Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
>> DPDK
>>
>> This is a proposal to enable using externally allocated memory in DPDK.
>>
>> In a nutshell, here is what is being done here:
>>
>> - Index internal malloc heaps by NUMA node index, rather than NUMA
>>    node itself (external heaps will have ID's in order of creation)
>> - Add identifier string to malloc heap, to uniquely identify it
>>    - Each new heap will receive a unique socket ID that will be used by
>>      allocator to decide from which heap (internal or external) to
>>      allocate requested amount of memory
>> - Allow creating named heaps and add/remove memory to/from those
>> heaps
>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>    of externally allocated memory
>>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>> - Allow malloc and memzones to allocate from external heaps
>> - Allow other data structures to allocate from externall heaps
>>
>> The responsibility to ensure memory is accessible before using it is on the
>> shoulders of the user - there is no checking done with regards to validity of
>> the memory (nor could there be...).
> 
> That makes sense. However who should be in-charge of mapping this memory for dma access?
> The user or internally be the PMD when encounter the first packet or while traversing the existing mempools?
> 
Hi Shahaf,

There are two ways this can be solved. The first way is to perform VFIO 
mapping automatically on adding/attaching memory. The second is to force 
user to do it manually. For now, the latter is chosen because user knows 
best if they intend to do DMA on that memory, but i'm open to suggestions.

There is an issue with some devices and buses (i.e. bus/fslmc) bypassing 
EAL VFIO infrastructure and performing their own VFIO/DMA mapping magic, 
but solving that problem is outside the scope of this patchset. Those 
devices/buses should fix themselves :)

When not using VFIO, it's out of our hands anyway.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-17 10:07   ` Burakov, Anatoly
@ 2018-09-17 12:16     ` Shahaf Shuler
  2018-09-17 13:00       ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Shahaf Shuler @ 2018-09-17 12:16 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, Thomas Monjalon

Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
> in DPDK
> 
> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:

[...]

> >> The responsibility to ensure memory is accessible before using it is
> >> on the shoulders of the user - there is no checking done with regards
> >> to validity of the memory (nor could there be...).
> >
> > That makes sense. However who should be in-charge of mapping this
> memory for dma access?
> > The user or internally be the PMD when encounter the first packet or while
> traversing the existing mempools?
> >
> Hi Shahaf,
> 
> There are two ways this can be solved. The first way is to perform VFIO
> mapping automatically on adding/attaching memory. The second is to force
> user to do it manually. For now, the latter is chosen because user knows best
> if they intend to do DMA on that memory, but i'm open to suggestions.

I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping). 
However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility. 
IMO DPDK should introduce generic and device agnostic APIs to the user. 

My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach. 

> 
> There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
> solving that problem is outside the scope of this patchset. Those
> devices/buses should fix themselves :)
> 
> When not using VFIO, it's out of our hands anyway.

Why? 
VFIO is not a must requirement for devices in DPDK. 

> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-17 12:16     ` Shahaf Shuler
@ 2018-09-17 13:00       ` Burakov, Anatoly
  2018-09-18 12:29         ` Shreyansh Jain
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 13:00 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, Thomas Monjalon

On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
>> in DPDK
>>
>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
> 
> [...]
> 
>>>> The responsibility to ensure memory is accessible before using it is
>>>> on the shoulders of the user - there is no checking done with regards
>>>> to validity of the memory (nor could there be...).
>>>
>>> That makes sense. However who should be in-charge of mapping this
>> memory for dma access?
>>> The user or internally be the PMD when encounter the first packet or while
>> traversing the existing mempools?
>>>
>> Hi Shahaf,
>>
>> There are two ways this can be solved. The first way is to perform VFIO
>> mapping automatically on adding/attaching memory. The second is to force
>> user to do it manually. For now, the latter is chosen because user knows best
>> if they intend to do DMA on that memory, but i'm open to suggestions.
> 
> I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping).

That is perhaps too hardware-specific - this should probably be handled 
inside the driver callbacks.

> However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility.
> IMO DPDK should introduce generic and device agnostic APIs to the user.
> 
> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
> It can be outside of this series, just wondering the people opinion on such approach.

I don't disagree. I don't like bus/net/etc drivers doing their own thing 
with regards to mapping, and i would by far prefer generic way to set up 
DMA maps, to which VFIO will be a subscriber.

> 
>>
>> There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
>> solving that problem is outside the scope of this patchset. Those
>> devices/buses should fix themselves :)
>>
>> When not using VFIO, it's out of our hands anyway.
> 
> Why?
> VFIO is not a must requirement for devices in DPDK.

When i say "out of our hands", what i mean to say is, currently as far 
as EAL API is concerned, there is no DMA mapping outside of VFIO.

> 
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-17 13:00       ` Burakov, Anatoly
@ 2018-09-18 12:29         ` Shreyansh Jain
  2018-09-18 15:15           ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Shreyansh Jain @ 2018-09-18 12:29 UTC (permalink / raw)
  To: Burakov, Anatoly, Shahaf Shuler
  Cc: dev, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	Thomas Monjalon

On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote:
> On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
>> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated 
>>> memory
>>> in DPDK
>>>
>>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
>>
>> [...]
>>
>>>>> The responsibility to ensure memory is accessible before using it is
>>>>> on the shoulders of the user - there is no checking done with regards
>>>>> to validity of the memory (nor could there be...).
>>>>
>>>> That makes sense. However who should be in-charge of mapping this
>>> memory for dma access?
>>>> The user or internally be the PMD when encounter the first packet or 
>>>> while
>>> traversing the existing mempools?
>>>>
>>> Hi Shahaf,
>>>
>>> There are two ways this can be solved. The first way is to perform VFIO
>>> mapping automatically on adding/attaching memory. The second is to force
>>> user to do it manually. For now, the latter is chosen because user 
>>> knows best
>>> if they intend to do DMA on that memory, but i'm open to suggestions.
>>
>> I agree with that approach, and will add not only if the mempool is 
>> for dma or not but also which ports will use this mempool (this can 
>> effect on the mapping).
> 
> That is perhaps too hardware-specific - this should probably be handled 
> inside the driver callbacks.
> 
>> However I don't think this is generic enough to use only VFIO. As you 
>> said, there are some devices not using VFIO for mapping rather some 
>> proprietary driver utility.
>> IMO DPDK should introduce generic and device agnostic APIs to the user.
>>
>> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap 
>> that have a generic dma_map(uint8_t port, address, len). Each driver 
>> will register with its own mapping callback (can be vfio_dma_map).
>> It can be outside of this series, just wondering the people opinion on 
>> such approach.
> 
> I don't disagree. I don't like bus/net/etc drivers doing their own thing 
> with regards to mapping, and i would by far prefer generic way to set up 
> DMA maps, to which VFIO will be a subscriber.
> 
>>
>>>
>>> There is an issue with some devices and buses (i.e. bus/fslmc) 
>>> bypassing EAL
>>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
>>> solving that problem is outside the scope of this patchset. Those
>>> devices/buses should fix themselves :)

DMA mapping is a very common principle and can be easily be a candidate 
for lets-make-generic-movement, but, being close to hardware (or 
hardware specific), it does require the driver to have some flexibility 
in terms of its eventual implementation.

I maintain one of those drivers (bus/fslmc) in DPDK which needs to have 
special VFIO layer - and from that experience, I can say that VFIO 
mapping does require some flexibility. SoC semantics are sometimes too 
complex to pin to general-universally-agreed-standard concept. (or, one 
can easily call it a 'bug', while it is a 'feature' for others :D)

In fact, NXP has another driver (bus/dpaa) which doesn't even work with 
VFIO - loves to work directly with Phys_addr. And, it is not at a lower 
priority than one with VFIO.

Thus, I really don't think a strongly controlled VFIO mapping should be 
EAL's responsibility. Failure because of lack of mapping is a driver's 
problem.

>>>
>>> When not using VFIO, it's out of our hands anyway.
>>
>> Why?
>> VFIO is not a must requirement for devices in DPDK.
> 
> When i say "out of our hands", what i mean to say is, currently as far 
> as EAL API is concerned, there is no DMA mapping outside of VFIO.
> 
>>
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
> 
> 

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK
  2018-09-18 12:29         ` Shreyansh Jain
@ 2018-09-18 15:15           ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-18 15:15 UTC (permalink / raw)
  To: Shreyansh Jain, Shahaf Shuler
  Cc: dev, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	Thomas Monjalon

On 18-Sep-18 1:29 PM, Shreyansh Jain wrote:
> On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote:
>> On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
>>> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>>>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated 
>>>> memory
>>>> in DPDK
>>>>
>>>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
>>>
>>> [...]
>>>
>>>>>> The responsibility to ensure memory is accessible before using it is
>>>>>> on the shoulders of the user - there is no checking done with regards
>>>>>> to validity of the memory (nor could there be...).
>>>>>
>>>>> That makes sense. However who should be in-charge of mapping this
>>>> memory for dma access?
>>>>> The user or internally be the PMD when encounter the first packet 
>>>>> or while
>>>> traversing the existing mempools?
>>>>>
>>>> Hi Shahaf,
>>>>
>>>> There are two ways this can be solved. The first way is to perform VFIO
>>>> mapping automatically on adding/attaching memory. The second is to 
>>>> force
>>>> user to do it manually. For now, the latter is chosen because user 
>>>> knows best
>>>> if they intend to do DMA on that memory, but i'm open to suggestions.
>>>
>>> I agree with that approach, and will add not only if the mempool is 
>>> for dma or not but also which ports will use this mempool (this can 
>>> effect on the mapping).
>>
>> That is perhaps too hardware-specific - this should probably be 
>> handled inside the driver callbacks.
>>
>>> However I don't think this is generic enough to use only VFIO. As you 
>>> said, there are some devices not using VFIO for mapping rather some 
>>> proprietary driver utility.
>>> IMO DPDK should introduce generic and device agnostic APIs to the user.
>>>
>>> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap 
>>> that have a generic dma_map(uint8_t port, address, len). Each driver 
>>> will register with its own mapping callback (can be vfio_dma_map).
>>> It can be outside of this series, just wondering the people opinion 
>>> on such approach.
>>
>> I don't disagree. I don't like bus/net/etc drivers doing their own 
>> thing with regards to mapping, and i would by far prefer generic way 
>> to set up DMA maps, to which VFIO will be a subscriber.
>>
>>>
>>>>
>>>> There is an issue with some devices and buses (i.e. bus/fslmc) 
>>>> bypassing EAL
>>>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, 
>>>> but
>>>> solving that problem is outside the scope of this patchset. Those
>>>> devices/buses should fix themselves :)
> 
> DMA mapping is a very common principle and can be easily be a candidate 
> for lets-make-generic-movement, but, being close to hardware (or 
> hardware specific), it does require the driver to have some flexibility 
> in terms of its eventual implementation.

Perhaps i didn't word my response clearly enough. I didn't mean to say 
(or imply) that EAL must handle all DMA mappings itself. Rather, EAL 
should provide a generic infrastructure of maintaining current mappings 
etc., and provide a subscription mechanism for other users (e.g. 
drivers) so that the details of implementation of exactly how to map 
things for DMA is up to the drivers.

In other words, we agree :)

> 
> I maintain one of those drivers (bus/fslmc) in DPDK which needs to have 
> special VFIO layer - and from that experience, I can say that VFIO 
> mapping does require some flexibility. SoC semantics are sometimes too 
> complex to pin to general-universally-agreed-standard concept. (or, one 
> can easily call it a 'bug', while it is a 'feature' for others :D)
> 
> In fact, NXP has another driver (bus/dpaa) which doesn't even work with 
> VFIO - loves to work directly with Phys_addr. And, it is not at a lower 
> priority than one with VFIO.
> 
> Thus, I really don't think a strongly controlled VFIO mapping should be 
> EAL's responsibility. Failure because of lack of mapping is a driver's 
> problem.
> 

While EAL doesn't necessarily need to be involved with mapping things 
for VFIO, i believe it does need to be the authority on what gets 
mapped. The user needs a way to make arbitrary memory available for DMA 
- this is where EAL comes in. VFIO itself can be factored out into a 
separate subsystem (DMA drivers, anyone? :D ), but given that memory 
cometh and goeth (external memory included), and given that some things 
tend to be a bit complicated [*], EAL needs to know when something is 
supposed to be mapped or unmapped, and when to notify subscribers that 
they may have to refresh their DMA maps.

[*] for example, VFIO can only do mappings whenever there are devices 
actually attached to a VFIO container, so we have to maintain all maps 
between hotplug events to ensure that memory set up for DMA doesn't 
silently get unmapped on device detach and subsequent attach.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 00/20] Support externally allocated memory in DPDK
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (16 preceding siblings ...)
  2018-09-13  7:44 ` [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Shahaf Shuler
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                     ` (20 more replies)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 01/20] mem: add length to memseg list Anatoly Burakov
                   ` (19 subsequent siblings)
  37 siblings, 21 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

Creating and destroying heaps is currently restricted to primary
processes, because we need to keep track of all socket ID's we've ever
used to prevent their reuse, and obviously different processes would
have kept different socket ID counters, and it isn't important enough
to put into shared memory. This means that secondary processes will
not be able to create new heaps. If this use case is important
enough, we can put the max socket ID into shared memory, or allow
socket ID reuse (which i do not think is a good idea because it has
the potential to make things harder to debug).

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (20):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  test: add unit tests for external memory support
  examples: add external memory example app
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide
  doc: add external memory sample application guide

 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  38 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  24 +-
 doc/guides/sample_app_ug/external_mem.rst     | 115 +++++
 doc/guides/sample_app_ug/index.rst            |   1 +
 drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 examples/external_mem/Makefile                |  62 +++
 examples/external_mem/extmem.c                | 461 ++++++++++++++++++
 examples/external_mem/meson.build             |  12 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   9 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   6 +-
 lib/librte_eal/common/include/rte_malloc.h    | 183 +++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_heap.c           | 300 ++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 393 ++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   7 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  31 +-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 +++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 45 files changed, 2096 insertions(+), 105 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/external_mem.rst
 create mode 100644 examples/external_mem/Makefile
 create mode 100644 examples/external_mem/extmem.c
 create mode 100644 examples/external_mem/meson.build
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 01/20] mem: add length to memseg list
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (17 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..0868bf681 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..d040a2f71 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -828,7 +828,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1314,6 +1314,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..c522538bf 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -857,6 +857,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1365,6 +1366,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1611,7 +1613,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (18 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 01/20] mem: add length to memseg list Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-20  9:30   ` Andrew Rybchenko
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                   ` (17 subsequent siblings)
  37 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 ---------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 +++--
 lib/librte_eal/common/eal_common_memory.c     |  4 +++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 ++++++
 lib/librte_eal/common/malloc_heap.c           |  9 ++++--
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 ++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 +++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 31 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 22 files changed, 122 insertions(+), 40 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e2dbee317..12122cb55 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..e2cbc82da 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -68,6 +68,13 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -84,6 +91,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -129,7 +139,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e2..d9ed15880 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..885c59c8a 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0868bf681..55a11bf4d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
@@ -547,6 +550,7 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
 	return ret;
 }
 
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..b381d1cb6 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 12aaf2d72..8c37b9d7c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -756,8 +759,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d040a2f71..8b0bbe43f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1250,6 +1250,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1298,6 +1301,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1328,6 +1334,9 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
 	int msl_idx;
 	int *data;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..d61c77da3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (!valid)
+		return 0;
+
+	if (msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +485,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 03/20] malloc: index heaps using heap ID rather than NUMA node
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (19 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 +++++---
 7 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 4bcbaf923..e96c52054 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index a8e479774..1f330c24e 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -21,6 +21,7 @@
 /****** library defines ********/
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 8c37b9d7c..c4d303533 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -563,12 +580,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -586,12 +605,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -607,7 +642,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -622,22 +657,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -645,11 +683,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -667,7 +705,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -682,11 +720,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -696,8 +736,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -762,7 +802,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -919,7 +959,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -957,7 +997,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..dfcdf380a 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (20 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 05/20] flow_classify: " Anatoly Burakov
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e2cbc82da..c04685d17 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -75,6 +75,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c4d303533..1dcb1de8f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -649,7 +649,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 05/20] flow_classify: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (21 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 06/20] pipeline: " Anatoly Burakov
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 06/20] pipeline: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (22 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 05/20] flow_classify: " Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 07/20] sched: " Anatoly Burakov
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 07/20] sched: do not check for invalid socket ID
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (23 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 06/20] pipeline: " Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 08/20] malloc: add name to malloc heaps Anatoly Burakov
                   ` (12 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 08/20] malloc: add name to malloc heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (24 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 07/20] sched: " Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 15 ++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1dcb1de8f..951991296 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1022,6 +1021,20 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	/* assign names to default DPDK heaps */
+	for (i = 0; i < rte_socket_count(); i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+		char heap_name[RTE_HEAP_NAME_MAX_LEN];
+		int socket_id = rte_socket_id_by_idx(i);
+
+		snprintf(heap_name, sizeof(heap_name) - 1,
+				"socket_%i", socket_id);
+		strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+		heap->socket_id = socket_id;
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 458c44ba6..0515d47f3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 09/20] malloc: add function to query socket ID of named heap
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (25 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 08/20] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 10/20] malloc: allow creating malloc heaps Anatoly Burakov
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0515d47f3..ce18ac79c 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..6fd729b8b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 10/20] malloc: allow creating malloc heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (26 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 11/20] malloc: allow destroying heaps Anatoly Burakov
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 19 ++++++++
 lib/librte_eal/common/malloc_heap.c        | 30 +++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 52 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 105 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..182afab1c 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 951991296..960f40d6b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1017,6 +1021,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index ce18ac79c..39875fe69 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 6fd729b8b..c93dcf1a3 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -311,6 +311,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 11/20] malloc: allow destroying heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (27 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 10/20] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 182afab1c..8a8cc1e6d 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 960f40d6b..117b7634c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1047,6 +1047,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 39875fe69..6734b0d09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index c93dcf1a3..1d3ca0716 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -312,6 +312,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 12/20] malloc: allow adding memory to named heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (28 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 11/20] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 13/20] malloc: allow removing memory from " Anatoly Burakov
                   ` (7 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8a8cc1e6d..47f867a05 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 117b7634c..0b19a064c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1021,6 +1021,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 6734b0d09..9d2041b7b 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -303,6 +303,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d3ca0716..0d052d20a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -314,6 +314,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 13/20] malloc: allow removing memory from named heaps
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (29 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
                   ` (6 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 47f867a05..9bbe8e3af 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 0b19a064c..a12bbbbee 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1021,6 +1021,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1095,6 +1121,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9d2041b7b..aed066882 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -354,6 +354,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0d052d20a..f10c34130 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 14/20] malloc: allow attaching to external memory chunks
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (30 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 13/20] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 15/20] malloc: allow detaching from external memory Anatoly Burakov
                   ` (5 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 48 +++++++++--
 lib/librte_eal/common/rte_malloc.c         | 93 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 9bbe8e3af..440496cd9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,21 +333,48 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
  * @note Heaps created via this call will automatically get assigned a unique
  *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on successful creation
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     EEXIST - heap by name of ``heap_name`` already exists
- *     ENOSPC - no more space in internal config to store a new heap
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     EEXIST          - heap by name of ``heap_name`` already exists
+ *     ENOSPC          - no more space in internal config to store a new heap
+ *     E_RTE_SECONDARY - attempted to create a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
@@ -357,16 +388,19 @@ rte_malloc_heap_create(const char *heap_name);
  * @note This function will return a failure result if not all memory segments
  *   were removed from the heap prior to its destruction
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on success
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     ENOENT - heap by the name of ``heap_name`` was not found
- *     EPERM  - attempting to destroy reserved heap
- *     EBUSY  - heap still contains data
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     ENOENT          - heap by the name of ``heap_name`` was not found
+ *     EPERM           - attempting to destroy reserved heap
+ *     EBUSY           - heap still contains data
+ *     E_RTE_SECONDARY - attempted to destroy a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_destroy(const char *heap_name);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index aed066882..bc22d21e4 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -400,6 +483,11 @@ rte_malloc_heap_create(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int i, ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
@@ -451,6 +539,11 @@ rte_malloc_heap_destroy(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f10c34130..822c5693a 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -315,6 +315,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 15/20] malloc: allow detaching from external memory
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (31 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 16/20] test: add unit tests for external memory support Anatoly Burakov
                   ` (4 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 ++++++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 27 ++++++++++++++++++----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 440496cd9..d2236c421 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bc22d21e4..e9be179d5 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -397,10 +397,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 822c5693a..73fecb912 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -316,6 +316,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 16/20] test: add unit tests for external memory support
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (32 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 15/20] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 17/20] examples: add external memory example app Anatoly Burakov
                   ` (3 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 17/20] examples: add external memory example app
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (33 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 16/20] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 18/20] doc: add external memory feature to the release notes Anatoly Burakov
                   ` (2 subsequent siblings)
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs

Introduce an example application demonstrating the use of
external memory support. This is a simple application based on
skeleton app, but instead of using internal DPDK memory, it is
using externally allocated memory.

The RX/TX and init path is a carbon-copy of skeleton app, with
no modifications whatseoever. The only difference is an additional
init stage to allocate memory and create a heap for it, and the
socket ID supplied to the mempool initialization function. The
memory used by this app is hugepage memory allocated anonymously.

Anonymous hugepage memory will not be allocated in a NUMA-aware
fashion, so there is a chance of performance degradation when
using this app, but given that kernel usually gives hugepages on
local socket first, this should not be a problem in most cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 examples/external_mem/Makefile    |  62 ++++
 examples/external_mem/extmem.c    | 461 ++++++++++++++++++++++++++++++
 examples/external_mem/meson.build |  12 +
 3 files changed, 535 insertions(+)
 create mode 100644 examples/external_mem/Makefile
 create mode 100644 examples/external_mem/extmem.c
 create mode 100644 examples/external_mem/meson.build

diff --git a/examples/external_mem/Makefile b/examples/external_mem/Makefile
new file mode 100644
index 000000000..3b6ab3b2f
--- /dev/null
+++ b/examples/external_mem/Makefile
@@ -0,0 +1,62 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2018 Intel Corporation
+
+# binary name
+APP = extmem
+
+# all source are stored in SRCS-y
+SRCS-y := extmem.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+	ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+	ln -sf $(APP)-static build/$(APP)
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+	@mkdir -p $@
+
+.PHONY: clean
+clean:
+	rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+	rmdir --ignore-fail-on-non-empty build
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+endif
diff --git a/examples/external_mem/extmem.c b/examples/external_mem/extmem.c
new file mode 100644
index 000000000..818a02171
--- /dev/null
+++ b/examples/external_mem/extmem.c
@@ -0,0 +1,461 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2018 Intel Corporation
+ */
+
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include <rte_eal.h>
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_vfio.h>
+
+#define RX_RING_SIZE 1024
+#define TX_RING_SIZE 1024
+
+#define NUM_MBUFS 8191
+#define MBUF_CACHE_SIZE 250
+#define BURST_SIZE 32
+#define EXTMEM_HEAP_NAME "extmem"
+
+static const struct rte_eth_conf port_conf_default = {
+	.rxmode = {
+		.max_rx_pkt_len = ETHER_MAX_LEN,
+	},
+};
+
+/* extmem.c: Basic DPDK skeleton forwarding example using external memory. */
+
+/*
+ * Initializes a given port using global settings and with the RX buffers
+ * coming from the mbuf_pool passed as a parameter.
+ */
+static inline int
+port_init(uint16_t port, struct rte_mempool *mbuf_pool)
+{
+	struct rte_eth_conf port_conf = port_conf_default;
+	const uint16_t rx_rings = 1, tx_rings = 1;
+	uint16_t nb_rxd = RX_RING_SIZE;
+	uint16_t nb_txd = TX_RING_SIZE;
+	int retval;
+	uint16_t q;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_txconf txconf;
+
+	if (!rte_eth_dev_is_valid_port(port))
+		return -1;
+
+	rte_eth_dev_info_get(port, &dev_info);
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE)
+		port_conf.txmode.offloads |=
+			DEV_TX_OFFLOAD_MBUF_FAST_FREE;
+
+	/* Configure the Ethernet device. */
+	retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
+	if (retval != 0)
+		return retval;
+
+	retval = rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
+	if (retval != 0)
+		return retval;
+
+	/* Allocate and set up 1 RX queue per Ethernet port. */
+	for (q = 0; q < rx_rings; q++) {
+		retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
+				rte_eth_dev_socket_id(port), NULL, mbuf_pool);
+		if (retval < 0)
+			return retval;
+	}
+
+	txconf = dev_info.default_txconf;
+	txconf.offloads = port_conf.txmode.offloads;
+	/* Allocate and set up 1 TX queue per Ethernet port. */
+	for (q = 0; q < tx_rings; q++) {
+		retval = rte_eth_tx_queue_setup(port, q, nb_txd,
+				rte_eth_dev_socket_id(port), &txconf);
+		if (retval < 0)
+			return retval;
+	}
+
+	/* Start the Ethernet port. */
+	retval = rte_eth_dev_start(port);
+	if (retval < 0)
+		return retval;
+
+	/* Display the port MAC address. */
+	struct ether_addr addr;
+	rte_eth_macaddr_get(port, &addr);
+	printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
+			   " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
+			port,
+			addr.addr_bytes[0], addr.addr_bytes[1],
+			addr.addr_bytes[2], addr.addr_bytes[3],
+			addr.addr_bytes[4], addr.addr_bytes[5]);
+
+	/* Enable RX in promiscuous mode for the Ethernet device. */
+	rte_eth_promiscuous_enable(port);
+
+	return 0;
+}
+
+/*
+ * The lcore main. This is the main thread that does the work, reading from
+ * an input port and writing to an output port.
+ */
+static __attribute__((noreturn)) void
+lcore_main(void)
+{
+	uint16_t port;
+
+	/*
+	 * Check that the port is on the same NUMA node as the polling thread
+	 * for best performance.
+	 */
+	RTE_ETH_FOREACH_DEV(port)
+		if (rte_eth_dev_socket_id(port) > 0 &&
+				rte_eth_dev_socket_id(port) !=
+						(int)rte_socket_id())
+			printf("WARNING, port %u is on remote NUMA node to "
+					"polling thread.\n\tPerformance will "
+					"not be optimal.\n", port);
+
+	printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
+			rte_lcore_id());
+
+	/* Run until the application is quit or killed. */
+	for (;;) {
+		/*
+		 * Receive packets on a port and forward them on the paired
+		 * port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
+		 */
+		RTE_ETH_FOREACH_DEV(port) {
+
+			/* Get burst of RX packets, from first port of pair. */
+			struct rte_mbuf *bufs[BURST_SIZE];
+			const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
+					bufs, BURST_SIZE);
+
+			if (unlikely(nb_rx == 0))
+				continue;
+
+			/* Send burst of TX packets, to second port of pair. */
+			const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
+					bufs, nb_rx);
+
+			/* Free any unsent packets. */
+			if (unlikely(nb_tx < nb_rx)) {
+				uint16_t buf;
+				for (buf = nb_tx; buf < nb_rx; buf++)
+					rte_pktmbuf_free(bufs[buf]);
+			}
+		}
+	}
+}
+
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_ports, uint32_t nb_mbufs_per_port,
+		uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	uint32_t nb_mbufs = nb_ports * nb_mbufs_per_port;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 16MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 16 << 20;
+
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+		/* contiguous - no need to account for page boundaries */
+		mbuf_mem = nb_mbufs * obj_sz;
+	} else {
+		/* account for possible non-contiguousness */
+		unsigned int n_pages, mbuf_per_pg, leftover;
+
+		mbuf_per_pg = pgsz / obj_sz;
+		leftover = (nb_mbufs % mbuf_per_pg) > 0;
+		n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+		mbuf_mem = n_pages * pgsz;
+	}
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		printf("Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+#ifndef MAP_HUGE_SHIFT
+#define HUGE_SHIFT 26
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return log2 << HUGE_SHIFT;
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz,
+		struct extmem_param *param)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int n_pages, cur_page, pgsz_idx;
+	size_t mem_sz, offset, cur_pgsz;
+	bool vfio_supported = true;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		ret = calc_mem_size(nb_ports, nb_mbufs_per_port,
+				mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			printf("Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			printf("Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+
+		/* populate IOVA table */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+
+			iova = (uintptr_t)rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+
+			if (vfio_supported) {
+				/* map memory for DMA */
+				ret = rte_vfio_dma_map((uintptr_t)addr,
+						iova, cur_pgsz);
+				if (ret < 0) {
+					/*
+					 * ENODEV means VFIO is not initialized
+					 * ENOTSUP means current IOMMU mode
+					 * doesn't support mapping
+					 * both cases are not an error
+					 */
+					if (rte_errno == ENOTSUP ||
+							rte_errno == ENODEV)
+						/* VFIO is unsupported, don't
+						 * try again.
+						 */
+						vfio_supported = false;
+					else
+						/* this is an actual error */
+						goto fail;
+				}
+			}
+		}
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz)
+{
+	struct extmem_param param;
+	int ret;
+
+	/* create our heap */
+	ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+	if (ret < 0) {
+		printf("Cannot create heap\n");
+		return -1;
+	}
+
+	ret = create_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz, &param);
+	if (ret < 0) {
+		printf("Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		printf("Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	printf("Allocated %zuMB of memory\n", param.len >> 20);
+
+	/* success */
+	return 0;
+}
+
+
+/*
+ * The main function, which does initialization and calls the per-lcore
+ * functions.
+ */
+int
+main(int argc, char *argv[])
+{
+	struct rte_mempool *mbuf_pool;
+	unsigned int nb_ports;
+	int socket_id;
+	uint16_t portid;
+	uint32_t nb_mbufs_per_port, mbuf_sz;
+
+	/* Initialize the Environment Abstraction Layer (EAL). */
+	int ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+	argc -= ret;
+	argv += ret;
+
+	/* Check that there is an even number of ports to send/receive on. */
+	nb_ports = rte_eth_dev_count_avail();
+	if (nb_ports < 2 || (nb_ports & 1))
+		rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
+
+	nb_mbufs_per_port = NUM_MBUFS;
+	mbuf_sz = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+	if (setup_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz) < 0)
+		rte_exit(EXIT_FAILURE, "Error: cannot set up external memory\n");
+
+	/* retrieve socket ID for our heap */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0)
+		rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n");
+
+	/* Creates a new mempool in memory to hold the mbufs. */
+	mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
+			nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0,
+			mbuf_sz, socket_id);
+
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
+
+	/* Initialize all ports. */
+	RTE_ETH_FOREACH_DEV(portid)
+		if (port_init(portid, mbuf_pool) != 0)
+			rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
+					portid);
+
+	if (rte_lcore_count() > 1)
+		printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
+
+	/* Call lcore_main on the master core only. */
+	lcore_main();
+
+	return 0;
+}
diff --git a/examples/external_mem/meson.build b/examples/external_mem/meson.build
new file mode 100644
index 000000000..17a363ad2
--- /dev/null
+++ b/examples/external_mem/meson.build
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+allow_experimental_apis = true
+sources = files(
+	'extmem.c'
+)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 18/20] doc: add external memory feature to the release notes
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (34 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 17/20] examples: add external memory example app Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 20/20] doc: add external memory sample application guide Anatoly Burakov
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index c04685d17..cc5b582f8 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 19/20] doc: add external memory feature to programmer's guide
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (35 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 18/20] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 20/20] doc: add external memory sample application guide Anatoly Burakov
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..37de8d63d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,44 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+  * If IOVA table is not specified, IOVA addresses will be assumed to be
+    unavailable
+  * Any DMA mappings for the external area are responsibility of the user
+  * Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+  * Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+  * Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v2 20/20] doc: add external memory sample application guide
  2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
                   ` (36 preceding siblings ...)
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
@ 2018-09-19 13:56 ` Anatoly Burakov
  37 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

Add a guide for external memory sample application. The application
is identical to Basic Forwarding example in everything except parts
of initialization code, so the bits that are identical will not be
described.

It is also not necessary to describe how external memory is being
allocated due to the expectation being that user will have their
own mechanisms to allocate memory outside of DPDK, and will only
be interested in how to integrate said memory into DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/sample_app_ug/external_mem.rst | 115 ++++++++++++++++++++++
 doc/guides/sample_app_ug/index.rst        |   1 +
 2 files changed, 116 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/external_mem.rst

diff --git a/doc/guides/sample_app_ug/external_mem.rst b/doc/guides/sample_app_ug/external_mem.rst
new file mode 100644
index 000000000..594c3397a
--- /dev/null
+++ b/doc/guides/sample_app_ug/external_mem.rst
@@ -0,0 +1,115 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2015-2018 Intel Corporation.
+
+External Memory Sample Application
+==================================
+
+The External Memory sample application is a simple *skeleton* example of a
+forwarding application using externally allocated memory.
+
+It is intended as a demonstration of the basic workflow of using externally
+allocated memory in DPDK. This application is based on Basic Forwarding sample
+application, and differs only in its initialization path. For more detailed
+explanation of port initialization and packet forwarding code, please refer to
+*Basic Forwarding sample application user guide*.
+
+Compiling the Application
+-------------------------
+
+To compile the sample application see :doc:`compiling`.
+
+The application is located in the ``external_mem`` sub-directory.
+
+Running the Application
+-----------------------
+
+To run the example in a ``linuxapp`` environment:
+
+.. code-block:: console
+
+    ./build/extmem -l 1 -n 4
+
+Refer to *DPDK Getting Started Guide* for general information on running
+applications and the Environment Abstraction Layer (EAL) options.
+
+
+Explanation
+-----------
+
+For general overview of the code and explanation of the main components of this
+application, please refer to *Basic Forwarding sample application user guide*.
+This guide will only explain sections of the code relevant to using external
+memory in DPDK.
+
+All DPDK library functions used in the sample code are prefixed with ``rte_``
+and are explained in detail in the *DPDK API Documentation*.
+
+
+External Memory Initialization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``main()`` function performs the initialization and calls the execution
+threads for each lcore.
+
+After initializing the Environment Abstraction Layer, the application also
+initializes external memory (in this case, it's allocating a chunk of memory
+using anonymous hugepages) inside the ``setup_extmem()`` local function.
+
+The first step in this process is to create an external heap:
+
+.. code-block:: c
+
+    ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+    if (ret < 0) {
+        printf("Cannot create heap\n");
+        return -1;
+    }
+
+Once the heap is created, ``create_extmem`` function is called to create the
+actual external memory area the application will be using. While the details of
+that process will not be described as they are not pertinent to the external
+memory API (it is expected that the user will have their own procedures to
+create external memory), there are a few important things to note.
+
+In order to add an externally allocated memory area to the newly created heap,
+the application needs the following pieces of information:
+
+* Pointer to start address of external memory area
+* Length of this area
+* Page size of memory backing this memory area
+* Optionally, a per-page IOVA table
+
+All of this information is to be provided by the user. Additionally, if VFIO is
+in use and if application intends to do DMA using the memory area, VFIO DMA
+mapping must also be performed using ``rte_vfio_dma_map`` function.
+
+Once the external memory is created and mapped for DMA, the application also has
+to add this memory to the heap that was created earlier:
+
+.. code-block:: c
+
+    ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+            param.addr, param.len, param.iova_table,
+            param.iova_table_len, param.pgsz);
+
+If return value indicates success, the memory area has been successfully added
+to the heap. The next step is to retrieve the socket ID of this heap:
+
+.. code-block:: c
+
+    socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+    if (socket_id < 0)
+        rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n");
+
+After that, the socket ID has to be supplied to the mempool creation function:
+
+.. code-block:: c
+
+    mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
+        nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0,
+        mbuf_sz, socket_id);
+
+The rest of the application is identical to the Basic Forwarding example.
+
+The forwarding loop can be interrupted and the application closed using
+``Ctrl-C``.
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index 5bedf4f6f..0536edb8e 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -15,6 +15,7 @@ Sample Applications User Guides
     exception_path
     hello_world
     skeleton
+    external_mem
     rxtx_callbacks
     flow_classify
     flow_filtering
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20  9:30   ` Andrew Rybchenko
  2018-09-20  9:54     ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Andrew Rybchenko @ 2018-09-20  9:30 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	thomas

On 9/19/18 4:56 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

A couple of minor questions/suggestions below, but it is OK to
go as is even if rejected.

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

<...>

> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 03e6b5f73..d61c77da3 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
>   	return new_obj_size * RTE_MEMPOOL_ALIGN;
>   }
>   
> +struct pagesz_walk_arg {
> +	int socket_id;
> +	size_t min;
> +};
> +
>   static int
>   find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
>   {
> -	size_t *min = arg;
> +	struct pagesz_walk_arg *wa = arg;
> +	bool valid;
>   
> -	if (msl->page_sz < *min)
> -		*min = msl->page_sz;
> +	valid = msl->socket_id == wa->socket_id;

Is it intended that we accept externally allocated segment
if it is on requested socket? If so, it would be good to add
comment to explain why.

> +	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
> +
> +	if (!valid)
> +		return 0;
> +
> +	if (msl->page_sz < wa->min)
> +		wa->min = msl->page_sz;

I'd suggest to keep single return (it is just a bit shorter)
if (valid && msl->page_sz < wa->min)
          wa->min = msl->page_sz;

<...>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  2018-09-20  9:30   ` Andrew Rybchenko
@ 2018-09-20  9:54     ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-20  9:54 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	thomas

On 20-Sep-18 10:30 AM, Andrew Rybchenko wrote:
> On 9/19/18 4:56 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> A couple of minor questions/suggestions below, but it is OK to
> go as is even if rejected.
> 
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> <...>
> 
>> diff --git a/lib/librte_mempool/rte_mempool.c 
>> b/lib/librte_mempool/rte_mempool.c
>> index 03e6b5f73..d61c77da3 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned 
>> obj_size)
>>       return new_obj_size * RTE_MEMPOOL_ALIGN;
>>   }
>> +struct pagesz_walk_arg {
>> +    int socket_id;
>> +    size_t min;
>> +};
>> +
>>   static int
>>   find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
>>   {
>> -    size_t *min = arg;
>> +    struct pagesz_walk_arg *wa = arg;
>> +    bool valid;
>> -    if (msl->page_sz < *min)
>> -        *min = msl->page_sz;
>> +    valid = msl->socket_id == wa->socket_id;
> 
> Is it intended that we accept externally allocated segment
> if it is on requested socket? If so, it would be good to add
> comment to explain why.

Accepting externally allocated segments is precisely the point here - we 
want to find page size of underlying memory, regardless of whether it's 
internal or external. We use socket ID to identify valid page sizes for 
a particular heap (since socket ID is technically a heap identifier, as 
far as external code is concerned), but within that heap there can be 
multiple segment lists corresponding to that socket ID, each with its 
own page size.

> 
>> +    valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
>> +
>> +    if (!valid)
>> +        return 0;
>> +
>> +    if (msl->page_sz < wa->min)
>> +        wa->min = msl->page_sz;
> 
> I'd suggest to keep single return (it is just a bit shorter)
> if (valid && msl->page_sz < wa->min)
>           wa->min = msl->page_sz;

Sure. If there will be other comments that warrant a v3 respin, i'll 
incorporate this feedback :)

Thanks for the review!

> 
> <...>
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 00/20] Support externally allocated memory in DPDK
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                       ` (20 more replies)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 01/20] mem: add length to memseg list Anatoly Burakov
                     ` (19 subsequent siblings)
  20 siblings, 21 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

Creating and destroying heaps is currently restricted to primary
processes, because we need to keep track of all socket ID's we've ever
used to prevent their reuse, and obviously different processes would
have kept different socket ID counters, and it isn't important enough
to put into shared memory. This means that secondary processes will
not be able to create new heaps. If this use case is important
enough, we can put the max socket ID into shared memory, or allow
socket ID reuse (which i do not think is a good idea because it has
the potential to make things harder to debug).

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (20):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  test: add unit tests for external memory support
  examples: add external memory example app
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide
  doc: add external memory sample application guide

 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  38 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  24 +-
 doc/guides/sample_app_ug/external_mem.rst     | 115 +++++
 doc/guides/sample_app_ug/index.rst            |   1 +
 drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 examples/external_mem/Makefile                |  62 +++
 examples/external_mem/extmem.c                | 461 ++++++++++++++++++
 examples/external_mem/meson.build             |  12 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   6 +-
 lib/librte_eal/common/include/rte_malloc.h    | 183 +++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_heap.c           | 300 ++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 393 ++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   7 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  35 +-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 +++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 45 files changed, 2099 insertions(+), 105 deletions(-)
 create mode 100644 doc/guides/sample_app_ug/external_mem.rst
 create mode 100644 examples/external_mem/Makefile
 create mode 100644 examples/external_mem/extmem.c
 create mode 100644 examples/external_mem/meson.build
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 01/20] mem: add length to memseg list
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs, arybchenko

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 01/20] mem: add length to memseg list Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 22 files changed, 125 insertions(+), 40 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 03/20] malloc: index heaps using heap ID rather than NUMA node
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (2 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs, arybchenko

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 +++++---
 7 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..1d1e35708 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +603,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +640,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +655,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +681,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +703,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +718,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +734,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +800,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +957,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +995,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..dfcdf380a 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (3 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 05/20] flow_classify: " Anatoly Burakov
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 05/20] flow_classify: do not check for invalid socket ID
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (4 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 06/20] pipeline: " Anatoly Burakov
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 06/20] pipeline: do not check for invalid socket ID
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (5 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 05/20] flow_classify: " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 07/20] sched: " Anatoly Burakov
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 07/20] sched: do not check for invalid socket ID
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (6 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 06/20] pipeline: " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 08/20] malloc: add name to malloc heaps Anatoly Burakov
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 08/20] malloc: add name to malloc heaps
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (7 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 07/20] sched: " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 15 ++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..2a5d2a381 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,20 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	/* assign names to default DPDK heaps */
+	for (i = 0; i < rte_socket_count(); i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+		char heap_name[RTE_HEAP_NAME_MAX_LEN];
+		int socket_id = rte_socket_id_by_idx(i);
+
+		snprintf(heap_name, sizeof(heap_name) - 1,
+				"socket_%i", socket_id);
+		strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+		heap->socket_id = socket_id;
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 458c44ba6..0515d47f3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 09/20] malloc: add function to query socket ID of named heap
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (8 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 08/20] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 10/20] malloc: allow creating malloc heaps Anatoly Burakov
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0515d47f3..ce18ac79c 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 10/20] malloc: allow creating malloc heaps
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (9 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 11/20] malloc: allow destroying heaps Anatoly Burakov
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 19 ++++++++
 lib/librte_eal/common/malloc_heap.c        | 30 +++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 52 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 105 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..182afab1c 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2a5d2a381..1dd4ffcf9 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index ce18ac79c..39875fe69 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -286,3 +287,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..d1e92361b 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 11/20] malloc: allow destroying heaps
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (10 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 10/20] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 182afab1c..8a8cc1e6d 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1dd4ffcf9..e98f720cb 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1045,6 +1045,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 39875fe69..6734b0d09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -288,6 +288,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -338,3 +353,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d1e92361b..7db38c8ac 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 12/20] malloc: allow adding memory to named heaps
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (11 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 11/20] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 13/20] malloc: allow removing memory from " Anatoly Burakov
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8a8cc1e6d..47f867a05 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index e98f720cb..2f6946f65 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 6734b0d09..9d2041b7b 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -303,6 +303,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 7db38c8ac..939124753 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 13/20] malloc: allow removing memory from named heaps
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (12 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 47f867a05..9bbe8e3af 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2f6946f65..3ac3b06de 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1093,6 +1119,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9d2041b7b..aed066882 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -354,6 +354,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 939124753..c0a8220d0 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 14/20] malloc: allow attaching to external memory chunks
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (13 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 13/20] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 15/20] malloc: allow detaching from external memory Anatoly Burakov
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 48 +++++++++--
 lib/librte_eal/common/rte_malloc.c         | 93 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 9bbe8e3af..440496cd9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,21 +333,48 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
  * @note Heaps created via this call will automatically get assigned a unique
  *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on successful creation
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     EEXIST - heap by name of ``heap_name`` already exists
- *     ENOSPC - no more space in internal config to store a new heap
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     EEXIST          - heap by name of ``heap_name`` already exists
+ *     ENOSPC          - no more space in internal config to store a new heap
+ *     E_RTE_SECONDARY - attempted to create a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
@@ -357,16 +388,19 @@ rte_malloc_heap_create(const char *heap_name);
  * @note This function will return a failure result if not all memory segments
  *   were removed from the heap prior to its destruction
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on success
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     ENOENT - heap by the name of ``heap_name`` was not found
- *     EPERM  - attempting to destroy reserved heap
- *     EBUSY  - heap still contains data
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     ENOENT          - heap by the name of ``heap_name`` was not found
+ *     EPERM           - attempting to destroy reserved heap
+ *     EBUSY           - heap still contains data
+ *     E_RTE_SECONDARY - attempted to destroy a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_destroy(const char *heap_name);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index aed066882..bc22d21e4 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -393,6 +393,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -400,6 +483,11 @@ rte_malloc_heap_create(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int i, ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
@@ -451,6 +539,11 @@ rte_malloc_heap_destroy(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index c0a8220d0..0a2e46767 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 15/20] malloc: allow detaching from external memory
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (14 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 16/20] test: add unit tests for external memory support Anatoly Burakov
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 ++++++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 27 ++++++++++++++++++----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 440496cd9..d2236c421 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bc22d21e4..e9be179d5 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -397,10 +397,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -415,7 +416,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -426,8 +430,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -461,9 +465,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -476,6 +481,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 0a2e46767..a535c4da8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 16/20] test: add unit tests for external memory support
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (15 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 15/20] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app Anatoly Burakov
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (16 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 16/20] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 22:47     ` Ananyev, Konstantin
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 18/20] doc: add external memory feature to the release notes Anatoly Burakov
                     ` (2 subsequent siblings)
  20 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Introduce an example application demonstrating the use of
external memory support. This is a simple application based on
skeleton app, but instead of using internal DPDK memory, it is
using externally allocated memory.

The RX/TX and init path is a carbon-copy of skeleton app, with
no modifications whatseoever. The only difference is an additional
init stage to allocate memory and create a heap for it, and the
socket ID supplied to the mempool initialization function. The
memory used by this app is hugepage memory allocated anonymously.

Anonymous hugepage memory will not be allocated in a NUMA-aware
fashion, so there is a chance of performance degradation when
using this app, but given that kernel usually gives hugepages on
local socket first, this should not be a problem in most cases.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 examples/external_mem/Makefile    |  62 ++++
 examples/external_mem/extmem.c    | 461 ++++++++++++++++++++++++++++++
 examples/external_mem/meson.build |  12 +
 3 files changed, 535 insertions(+)
 create mode 100644 examples/external_mem/Makefile
 create mode 100644 examples/external_mem/extmem.c
 create mode 100644 examples/external_mem/meson.build

diff --git a/examples/external_mem/Makefile b/examples/external_mem/Makefile
new file mode 100644
index 000000000..3b6ab3b2f
--- /dev/null
+++ b/examples/external_mem/Makefile
@@ -0,0 +1,62 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2018 Intel Corporation
+
+# binary name
+APP = extmem
+
+# all source are stored in SRCS-y
+SRCS-y := extmem.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+	ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+	ln -sf $(APP)-static build/$(APP)
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+	@mkdir -p $@
+
+.PHONY: clean
+clean:
+	rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+	rmdir --ignore-fail-on-non-empty build
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overridden by command line or environment
+RTE_TARGET ?= x86_64-native-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+endif
diff --git a/examples/external_mem/extmem.c b/examples/external_mem/extmem.c
new file mode 100644
index 000000000..818a02171
--- /dev/null
+++ b/examples/external_mem/extmem.c
@@ -0,0 +1,461 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2018 Intel Corporation
+ */
+
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdbool.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#include <rte_eal.h>
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_vfio.h>
+
+#define RX_RING_SIZE 1024
+#define TX_RING_SIZE 1024
+
+#define NUM_MBUFS 8191
+#define MBUF_CACHE_SIZE 250
+#define BURST_SIZE 32
+#define EXTMEM_HEAP_NAME "extmem"
+
+static const struct rte_eth_conf port_conf_default = {
+	.rxmode = {
+		.max_rx_pkt_len = ETHER_MAX_LEN,
+	},
+};
+
+/* extmem.c: Basic DPDK skeleton forwarding example using external memory. */
+
+/*
+ * Initializes a given port using global settings and with the RX buffers
+ * coming from the mbuf_pool passed as a parameter.
+ */
+static inline int
+port_init(uint16_t port, struct rte_mempool *mbuf_pool)
+{
+	struct rte_eth_conf port_conf = port_conf_default;
+	const uint16_t rx_rings = 1, tx_rings = 1;
+	uint16_t nb_rxd = RX_RING_SIZE;
+	uint16_t nb_txd = TX_RING_SIZE;
+	int retval;
+	uint16_t q;
+	struct rte_eth_dev_info dev_info;
+	struct rte_eth_txconf txconf;
+
+	if (!rte_eth_dev_is_valid_port(port))
+		return -1;
+
+	rte_eth_dev_info_get(port, &dev_info);
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE)
+		port_conf.txmode.offloads |=
+			DEV_TX_OFFLOAD_MBUF_FAST_FREE;
+
+	/* Configure the Ethernet device. */
+	retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
+	if (retval != 0)
+		return retval;
+
+	retval = rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
+	if (retval != 0)
+		return retval;
+
+	/* Allocate and set up 1 RX queue per Ethernet port. */
+	for (q = 0; q < rx_rings; q++) {
+		retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
+				rte_eth_dev_socket_id(port), NULL, mbuf_pool);
+		if (retval < 0)
+			return retval;
+	}
+
+	txconf = dev_info.default_txconf;
+	txconf.offloads = port_conf.txmode.offloads;
+	/* Allocate and set up 1 TX queue per Ethernet port. */
+	for (q = 0; q < tx_rings; q++) {
+		retval = rte_eth_tx_queue_setup(port, q, nb_txd,
+				rte_eth_dev_socket_id(port), &txconf);
+		if (retval < 0)
+			return retval;
+	}
+
+	/* Start the Ethernet port. */
+	retval = rte_eth_dev_start(port);
+	if (retval < 0)
+		return retval;
+
+	/* Display the port MAC address. */
+	struct ether_addr addr;
+	rte_eth_macaddr_get(port, &addr);
+	printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
+			   " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
+			port,
+			addr.addr_bytes[0], addr.addr_bytes[1],
+			addr.addr_bytes[2], addr.addr_bytes[3],
+			addr.addr_bytes[4], addr.addr_bytes[5]);
+
+	/* Enable RX in promiscuous mode for the Ethernet device. */
+	rte_eth_promiscuous_enable(port);
+
+	return 0;
+}
+
+/*
+ * The lcore main. This is the main thread that does the work, reading from
+ * an input port and writing to an output port.
+ */
+static __attribute__((noreturn)) void
+lcore_main(void)
+{
+	uint16_t port;
+
+	/*
+	 * Check that the port is on the same NUMA node as the polling thread
+	 * for best performance.
+	 */
+	RTE_ETH_FOREACH_DEV(port)
+		if (rte_eth_dev_socket_id(port) > 0 &&
+				rte_eth_dev_socket_id(port) !=
+						(int)rte_socket_id())
+			printf("WARNING, port %u is on remote NUMA node to "
+					"polling thread.\n\tPerformance will "
+					"not be optimal.\n", port);
+
+	printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
+			rte_lcore_id());
+
+	/* Run until the application is quit or killed. */
+	for (;;) {
+		/*
+		 * Receive packets on a port and forward them on the paired
+		 * port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
+		 */
+		RTE_ETH_FOREACH_DEV(port) {
+
+			/* Get burst of RX packets, from first port of pair. */
+			struct rte_mbuf *bufs[BURST_SIZE];
+			const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
+					bufs, BURST_SIZE);
+
+			if (unlikely(nb_rx == 0))
+				continue;
+
+			/* Send burst of TX packets, to second port of pair. */
+			const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
+					bufs, nb_rx);
+
+			/* Free any unsent packets. */
+			if (unlikely(nb_tx < nb_rx)) {
+				uint16_t buf;
+				for (buf = nb_tx; buf < nb_rx; buf++)
+					rte_pktmbuf_free(bufs[buf]);
+			}
+		}
+	}
+}
+
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_ports, uint32_t nb_mbufs_per_port,
+		uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	uint32_t nb_mbufs = nb_ports * nb_mbufs_per_port;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 16MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 16 << 20;
+
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+		/* contiguous - no need to account for page boundaries */
+		mbuf_mem = nb_mbufs * obj_sz;
+	} else {
+		/* account for possible non-contiguousness */
+		unsigned int n_pages, mbuf_per_pg, leftover;
+
+		mbuf_per_pg = pgsz / obj_sz;
+		leftover = (nb_mbufs % mbuf_per_pg) > 0;
+		n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+		mbuf_mem = n_pages * pgsz;
+	}
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		printf("Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+#ifndef MAP_HUGE_SHIFT
+#define HUGE_SHIFT 26
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return log2 << HUGE_SHIFT;
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz,
+		struct extmem_param *param)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int n_pages, cur_page, pgsz_idx;
+	size_t mem_sz, offset, cur_pgsz;
+	bool vfio_supported = true;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		ret = calc_mem_size(nb_ports, nb_mbufs_per_port,
+				mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			printf("Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			printf("Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+
+		/* populate IOVA table */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+
+			iova = (uintptr_t)rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+
+			if (vfio_supported) {
+				/* map memory for DMA */
+				ret = rte_vfio_dma_map((uintptr_t)addr,
+						iova, cur_pgsz);
+				if (ret < 0) {
+					/*
+					 * ENODEV means VFIO is not initialized
+					 * ENOTSUP means current IOMMU mode
+					 * doesn't support mapping
+					 * both cases are not an error
+					 */
+					if (rte_errno == ENOTSUP ||
+							rte_errno == ENODEV)
+						/* VFIO is unsupported, don't
+						 * try again.
+						 */
+						vfio_supported = false;
+					else
+						/* this is an actual error */
+						goto fail;
+				}
+			}
+		}
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz)
+{
+	struct extmem_param param;
+	int ret;
+
+	/* create our heap */
+	ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+	if (ret < 0) {
+		printf("Cannot create heap\n");
+		return -1;
+	}
+
+	ret = create_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz, &param);
+	if (ret < 0) {
+		printf("Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		printf("Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	printf("Allocated %zuMB of memory\n", param.len >> 20);
+
+	/* success */
+	return 0;
+}
+
+
+/*
+ * The main function, which does initialization and calls the per-lcore
+ * functions.
+ */
+int
+main(int argc, char *argv[])
+{
+	struct rte_mempool *mbuf_pool;
+	unsigned int nb_ports;
+	int socket_id;
+	uint16_t portid;
+	uint32_t nb_mbufs_per_port, mbuf_sz;
+
+	/* Initialize the Environment Abstraction Layer (EAL). */
+	int ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
+
+	argc -= ret;
+	argv += ret;
+
+	/* Check that there is an even number of ports to send/receive on. */
+	nb_ports = rte_eth_dev_count_avail();
+	if (nb_ports < 2 || (nb_ports & 1))
+		rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
+
+	nb_mbufs_per_port = NUM_MBUFS;
+	mbuf_sz = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+	if (setup_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz) < 0)
+		rte_exit(EXIT_FAILURE, "Error: cannot set up external memory\n");
+
+	/* retrieve socket ID for our heap */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0)
+		rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n");
+
+	/* Creates a new mempool in memory to hold the mbufs. */
+	mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
+			nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0,
+			mbuf_sz, socket_id);
+
+	if (mbuf_pool == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
+
+	/* Initialize all ports. */
+	RTE_ETH_FOREACH_DEV(portid)
+		if (port_init(portid, mbuf_pool) != 0)
+			rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
+					portid);
+
+	if (rte_lcore_count() > 1)
+		printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
+
+	/* Call lcore_main on the master core only. */
+	lcore_main();
+
+	return 0;
+}
diff --git a/examples/external_mem/meson.build b/examples/external_mem/meson.build
new file mode 100644
index 000000000..17a363ad2
--- /dev/null
+++ b/examples/external_mem/meson.build
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+# meson file, for building this example as part of a main DPDK build.
+#
+# To build this example as a standalone application with an already-installed
+# DPDK instance, use 'make'
+
+allow_experimental_apis = true
+sources = files(
+	'extmem.c'
+)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 18/20] doc: add external memory feature to the release notes
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (17 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 20/20] doc: add external memory sample application guide Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 63bbb1b51..9a05c9980 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,11 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 19/20] doc: add external memory feature to programmer's guide
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (18 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 18/20] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 20/20] doc: add external memory sample application guide Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..37de8d63d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,44 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+  * If IOVA table is not specified, IOVA addresses will be assumed to be
+    unavailable
+  * Any DMA mappings for the external area are responsibility of the user
+  * Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+  * Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+  * Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v3 20/20] doc: add external memory sample application guide
  2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
                     ` (19 preceding siblings ...)
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
@ 2018-09-20 11:36   ` Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add a guide for external memory sample application. The application
is identical to Basic Forwarding example in everything except parts
of initialization code, so the bits that are identical will not be
described.

It is also not necessary to describe how external memory is being
allocated due to the expectation being that user will have their
own mechanisms to allocate memory outside of DPDK, and will only
be interested in how to integrate said memory into DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/sample_app_ug/external_mem.rst | 115 ++++++++++++++++++++++
 doc/guides/sample_app_ug/index.rst        |   1 +
 2 files changed, 116 insertions(+)
 create mode 100644 doc/guides/sample_app_ug/external_mem.rst

diff --git a/doc/guides/sample_app_ug/external_mem.rst b/doc/guides/sample_app_ug/external_mem.rst
new file mode 100644
index 000000000..594c3397a
--- /dev/null
+++ b/doc/guides/sample_app_ug/external_mem.rst
@@ -0,0 +1,115 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2015-2018 Intel Corporation.
+
+External Memory Sample Application
+==================================
+
+The External Memory sample application is a simple *skeleton* example of a
+forwarding application using externally allocated memory.
+
+It is intended as a demonstration of the basic workflow of using externally
+allocated memory in DPDK. This application is based on Basic Forwarding sample
+application, and differs only in its initialization path. For more detailed
+explanation of port initialization and packet forwarding code, please refer to
+*Basic Forwarding sample application user guide*.
+
+Compiling the Application
+-------------------------
+
+To compile the sample application see :doc:`compiling`.
+
+The application is located in the ``external_mem`` sub-directory.
+
+Running the Application
+-----------------------
+
+To run the example in a ``linuxapp`` environment:
+
+.. code-block:: console
+
+    ./build/extmem -l 1 -n 4
+
+Refer to *DPDK Getting Started Guide* for general information on running
+applications and the Environment Abstraction Layer (EAL) options.
+
+
+Explanation
+-----------
+
+For general overview of the code and explanation of the main components of this
+application, please refer to *Basic Forwarding sample application user guide*.
+This guide will only explain sections of the code relevant to using external
+memory in DPDK.
+
+All DPDK library functions used in the sample code are prefixed with ``rte_``
+and are explained in detail in the *DPDK API Documentation*.
+
+
+External Memory Initialization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``main()`` function performs the initialization and calls the execution
+threads for each lcore.
+
+After initializing the Environment Abstraction Layer, the application also
+initializes external memory (in this case, it's allocating a chunk of memory
+using anonymous hugepages) inside the ``setup_extmem()`` local function.
+
+The first step in this process is to create an external heap:
+
+.. code-block:: c
+
+    ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+    if (ret < 0) {
+        printf("Cannot create heap\n");
+        return -1;
+    }
+
+Once the heap is created, ``create_extmem`` function is called to create the
+actual external memory area the application will be using. While the details of
+that process will not be described as they are not pertinent to the external
+memory API (it is expected that the user will have their own procedures to
+create external memory), there are a few important things to note.
+
+In order to add an externally allocated memory area to the newly created heap,
+the application needs the following pieces of information:
+
+* Pointer to start address of external memory area
+* Length of this area
+* Page size of memory backing this memory area
+* Optionally, a per-page IOVA table
+
+All of this information is to be provided by the user. Additionally, if VFIO is
+in use and if application intends to do DMA using the memory area, VFIO DMA
+mapping must also be performed using ``rte_vfio_dma_map`` function.
+
+Once the external memory is created and mapped for DMA, the application also has
+to add this memory to the heap that was created earlier:
+
+.. code-block:: c
+
+    ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+            param.addr, param.len, param.iova_table,
+            param.iova_table_len, param.pgsz);
+
+If return value indicates success, the memory area has been successfully added
+to the heap. The next step is to retrieve the socket ID of this heap:
+
+.. code-block:: c
+
+    socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+    if (socket_id < 0)
+        rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n");
+
+After that, the socket ID has to be supplied to the mempool creation function:
+
+.. code-block:: c
+
+    mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
+        nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0,
+        mbuf_sz, socket_id);
+
+The rest of the application is identical to the Basic Forwarding example.
+
+The forwarding loop can be interrupted and the application closed using
+``Ctrl-C``.
diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst
index 5bedf4f6f..0536edb8e 100644
--- a/doc/guides/sample_app_ug/index.rst
+++ b/doc/guides/sample_app_ug/index.rst
@@ -15,6 +15,7 @@ Sample Applications User Guides
     exception_path
     hello_world
     skeleton
+    external_mem
     rxtx_callbacks
     flow_classify
     flow_filtering
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app Anatoly Burakov
@ 2018-09-20 22:47     ` Ananyev, Konstantin
  2018-09-21  9:03       ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Ananyev, Konstantin @ 2018-09-20 22:47 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko


Hi Anatoly

> 
> Introduce an example application demonstrating the use of
> external memory support. This is a simple application based on
> skeleton app, but instead of using internal DPDK memory, it is
> using externally allocated memory.
> 
> The RX/TX and init path is a carbon-copy of skeleton app, with
> no modifications whatseoever. The only difference is an additional
> init stage to allocate memory and create a heap for it, and the
> socket ID supplied to the mempool initialization function. The
> memory used by this app is hugepage memory allocated anonymously.
> 
> Anonymous hugepage memory will not be allocated in a NUMA-aware
> fashion, so there is a chance of performance degradation when
> using this app, but given that kernel usually gives hugepages on
> local socket first, this should not be a problem in most cases.

Do we need a new sample app just for that?
Couldn't it be added into testpmd, same, as we have now 'mp-anon'
to use mempool over anonymous memory?
Konstantin

> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  examples/external_mem/Makefile    |  62 ++++
>  examples/external_mem/extmem.c    | 461 ++++++++++++++++++++++++++++++
>  examples/external_mem/meson.build |  12 +
>  3 files changed, 535 insertions(+)
>  create mode 100644 examples/external_mem/Makefile
>  create mode 100644 examples/external_mem/extmem.c
>  create mode 100644 examples/external_mem/meson.build
> 
> diff --git a/examples/external_mem/Makefile b/examples/external_mem/Makefile
> new file mode 100644
> index 000000000..3b6ab3b2f
> --- /dev/null
> +++ b/examples/external_mem/Makefile
> @@ -0,0 +1,62 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2010-2018 Intel Corporation
> +
> +# binary name
> +APP = extmem
> +
> +# all source are stored in SRCS-y
> +SRCS-y := extmem.c
> +
> +# Build using pkg-config variables if possible
> +$(shell pkg-config --exists libdpdk)
> +ifeq ($(.SHELLSTATUS),0)
> +
> +all: shared
> +.PHONY: shared static
> +shared: build/$(APP)-shared
> +	ln -sf $(APP)-shared build/$(APP)
> +static: build/$(APP)-static
> +	ln -sf $(APP)-static build/$(APP)
> +
> +PC_FILE := $(shell pkg-config --path libdpdk)
> +CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
> +LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
> +
> +build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
> +	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
> +
> +build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
> +	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
> +
> +build:
> +	@mkdir -p $@
> +
> +.PHONY: clean
> +clean:
> +	rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
> +	rmdir --ignore-fail-on-non-empty build
> +
> +else # Build using legacy build system
> +
> +ifeq ($(RTE_SDK),)
> +$(error "Please define RTE_SDK environment variable")
> +endif
> +
> +# Default target, can be overridden by command line or environment
> +RTE_TARGET ?= x86_64-native-linuxapp-gcc
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +
> +# workaround for a gcc bug with noreturn attribute
> +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> +CFLAGS_main.o += -Wno-return-type
> +endif
> +
> +include $(RTE_SDK)/mk/rte.extapp.mk
> +endif
> diff --git a/examples/external_mem/extmem.c b/examples/external_mem/extmem.c
> new file mode 100644
> index 000000000..818a02171
> --- /dev/null
> +++ b/examples/external_mem/extmem.c
> @@ -0,0 +1,461 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2018 Intel Corporation
> + */
> +
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +
> +#include <rte_eal.h>
> +#include <rte_ethdev.h>
> +#include <rte_cycles.h>
> +#include <rte_lcore.h>
> +#include <rte_mbuf.h>
> +#include <rte_malloc.h>
> +#include <rte_memory.h>
> +#include <rte_vfio.h>
> +
> +#define RX_RING_SIZE 1024
> +#define TX_RING_SIZE 1024
> +
> +#define NUM_MBUFS 8191
> +#define MBUF_CACHE_SIZE 250
> +#define BURST_SIZE 32
> +#define EXTMEM_HEAP_NAME "extmem"
> +
> +static const struct rte_eth_conf port_conf_default = {
> +	.rxmode = {
> +		.max_rx_pkt_len = ETHER_MAX_LEN,
> +	},
> +};
> +
> +/* extmem.c: Basic DPDK skeleton forwarding example using external memory. */
> +
> +/*
> + * Initializes a given port using global settings and with the RX buffers
> + * coming from the mbuf_pool passed as a parameter.
> + */
> +static inline int
> +port_init(uint16_t port, struct rte_mempool *mbuf_pool)
> +{
> +	struct rte_eth_conf port_conf = port_conf_default;
> +	const uint16_t rx_rings = 1, tx_rings = 1;
> +	uint16_t nb_rxd = RX_RING_SIZE;
> +	uint16_t nb_txd = TX_RING_SIZE;
> +	int retval;
> +	uint16_t q;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_eth_txconf txconf;
> +
> +	if (!rte_eth_dev_is_valid_port(port))
> +		return -1;
> +
> +	rte_eth_dev_info_get(port, &dev_info);
> +	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE)
> +		port_conf.txmode.offloads |=
> +			DEV_TX_OFFLOAD_MBUF_FAST_FREE;
> +
> +	/* Configure the Ethernet device. */
> +	retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
> +	if (retval != 0)
> +		return retval;
> +
> +	retval = rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
> +	if (retval != 0)
> +		return retval;
> +
> +	/* Allocate and set up 1 RX queue per Ethernet port. */
> +	for (q = 0; q < rx_rings; q++) {
> +		retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
> +				rte_eth_dev_socket_id(port), NULL, mbuf_pool);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	txconf = dev_info.default_txconf;
> +	txconf.offloads = port_conf.txmode.offloads;
> +	/* Allocate and set up 1 TX queue per Ethernet port. */
> +	for (q = 0; q < tx_rings; q++) {
> +		retval = rte_eth_tx_queue_setup(port, q, nb_txd,
> +				rte_eth_dev_socket_id(port), &txconf);
> +		if (retval < 0)
> +			return retval;
> +	}
> +
> +	/* Start the Ethernet port. */
> +	retval = rte_eth_dev_start(port);
> +	if (retval < 0)
> +		return retval;
> +
> +	/* Display the port MAC address. */
> +	struct ether_addr addr;
> +	rte_eth_macaddr_get(port, &addr);
> +	printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
> +			   " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
> +			port,
> +			addr.addr_bytes[0], addr.addr_bytes[1],
> +			addr.addr_bytes[2], addr.addr_bytes[3],
> +			addr.addr_bytes[4], addr.addr_bytes[5]);
> +
> +	/* Enable RX in promiscuous mode for the Ethernet device. */
> +	rte_eth_promiscuous_enable(port);
> +
> +	return 0;
> +}
> +
> +/*
> + * The lcore main. This is the main thread that does the work, reading from
> + * an input port and writing to an output port.
> + */
> +static __attribute__((noreturn)) void
> +lcore_main(void)
> +{
> +	uint16_t port;
> +
> +	/*
> +	 * Check that the port is on the same NUMA node as the polling thread
> +	 * for best performance.
> +	 */
> +	RTE_ETH_FOREACH_DEV(port)
> +		if (rte_eth_dev_socket_id(port) > 0 &&
> +				rte_eth_dev_socket_id(port) !=
> +						(int)rte_socket_id())
> +			printf("WARNING, port %u is on remote NUMA node to "
> +					"polling thread.\n\tPerformance will "
> +					"not be optimal.\n", port);
> +
> +	printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
> +			rte_lcore_id());
> +
> +	/* Run until the application is quit or killed. */
> +	for (;;) {
> +		/*
> +		 * Receive packets on a port and forward them on the paired
> +		 * port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
> +		 */
> +		RTE_ETH_FOREACH_DEV(port) {
> +
> +			/* Get burst of RX packets, from first port of pair. */
> +			struct rte_mbuf *bufs[BURST_SIZE];
> +			const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
> +					bufs, BURST_SIZE);
> +
> +			if (unlikely(nb_rx == 0))
> +				continue;
> +
> +			/* Send burst of TX packets, to second port of pair. */
> +			const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
> +					bufs, nb_rx);
> +
> +			/* Free any unsent packets. */
> +			if (unlikely(nb_tx < nb_rx)) {
> +				uint16_t buf;
> +				for (buf = nb_tx; buf < nb_rx; buf++)
> +					rte_pktmbuf_free(bufs[buf]);
> +			}
> +		}
> +	}
> +}
> +
> +/* extremely pessimistic estimation of memory required to create a mempool */
> +static int
> +calc_mem_size(uint32_t nb_ports, uint32_t nb_mbufs_per_port,
> +		uint32_t mbuf_sz, size_t pgsz, size_t *out)
> +{
> +	uint32_t nb_mbufs = nb_ports * nb_mbufs_per_port;
> +	uint64_t total_mem, mbuf_mem, obj_sz;
> +
> +	/* there is no good way to predict how much space the mempool will
> +	 * occupy because it will allocate chunks on the fly, and some of those
> +	 * will come from default DPDK memory while some will come from our
> +	 * external memory, so just assume 16MB will be enough for everyone.
> +	 */
> +	uint64_t hdr_mem = 16 << 20;
> +
> +	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
> +	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
> +		/* contiguous - no need to account for page boundaries */
> +		mbuf_mem = nb_mbufs * obj_sz;
> +	} else {
> +		/* account for possible non-contiguousness */
> +		unsigned int n_pages, mbuf_per_pg, leftover;
> +
> +		mbuf_per_pg = pgsz / obj_sz;
> +		leftover = (nb_mbufs % mbuf_per_pg) > 0;
> +		n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
> +
> +		mbuf_mem = n_pages * pgsz;
> +	}
> +
> +	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
> +
> +	if (total_mem > SIZE_MAX) {
> +		printf("Memory size too big\n");
> +		return -1;
> +	}
> +	*out = (size_t)total_mem;
> +
> +	return 0;
> +}
> +
> +static inline uint32_t
> +bsf64(uint64_t v)
> +{
> +	return (uint32_t)__builtin_ctzll(v);
> +}
> +
> +static inline uint32_t
> +log2_u64(uint64_t v)
> +{
> +	if (v == 0)
> +		return 0;
> +	v = rte_align64pow2(v);
> +	return bsf64(v);
> +}
> +
> +#ifndef MAP_HUGE_SHIFT
> +#define HUGE_SHIFT 26
> +#else
> +#define HUGE_SHIFT MAP_HUGE_SHIFT
> +#endif
> +
> +static int
> +pagesz_flags(uint64_t page_sz)
> +{
> +	/* as per mmap() manpage, all page sizes are log2 of page size
> +	 * shifted by MAP_HUGE_SHIFT
> +	 */
> +	int log2 = log2_u64(page_sz);
> +	return log2 << HUGE_SHIFT;
> +}
> +
> +static void *
> +alloc_mem(size_t memsz, size_t pgsz)
> +{
> +	void *addr;
> +	int flags;
> +
> +	/* allocate anonymous hugepages */
> +	flags = MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | pagesz_flags(pgsz);
> +
> +	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
> +	if (addr == MAP_FAILED)
> +		return NULL;
> +
> +	return addr;
> +}
> +
> +struct extmem_param {
> +	void *addr;
> +	size_t len;
> +	size_t pgsz;
> +	rte_iova_t *iova_table;
> +	unsigned int iova_table_len;
> +};
> +
> +static int
> +create_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz,
> +		struct extmem_param *param)
> +{
> +	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
> +			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
> +	unsigned int n_pages, cur_page, pgsz_idx;
> +	size_t mem_sz, offset, cur_pgsz;
> +	bool vfio_supported = true;
> +	rte_iova_t *iovas = NULL;
> +	void *addr;
> +	int ret;
> +
> +	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
> +		/* skip anything that is too big */
> +		if (pgsizes[pgsz_idx] > SIZE_MAX)
> +			continue;
> +
> +		cur_pgsz = pgsizes[pgsz_idx];
> +
> +		ret = calc_mem_size(nb_ports, nb_mbufs_per_port,
> +				mbuf_sz, cur_pgsz, &mem_sz);
> +		if (ret < 0) {
> +			printf("Cannot calculate memory size\n");
> +			return -1;
> +		}
> +
> +		/* allocate our memory */
> +		addr = alloc_mem(mem_sz, cur_pgsz);
> +
> +		/* if we couldn't allocate memory with a specified page size,
> +		 * that doesn't mean we can't do it with other page sizes, so
> +		 * try another one.
> +		 */
> +		if (addr == NULL)
> +			continue;
> +
> +		/* store IOVA addresses for every page in this memory area */
> +		n_pages = mem_sz / cur_pgsz;
> +
> +		iovas = malloc(sizeof(*iovas) * n_pages);
> +
> +		if (iovas == NULL) {
> +			printf("Cannot allocate memory for iova addresses\n");
> +			goto fail;
> +		}
> +
> +		/* populate IOVA table */
> +		for (cur_page = 0; cur_page < n_pages; cur_page++) {
> +			rte_iova_t iova;
> +			void *cur;
> +
> +			offset = cur_pgsz * cur_page;
> +			cur = RTE_PTR_ADD(addr, offset);
> +
> +			iova = (uintptr_t)rte_mem_virt2iova(cur);
> +
> +			iovas[cur_page] = iova;
> +
> +			if (vfio_supported) {
> +				/* map memory for DMA */
> +				ret = rte_vfio_dma_map((uintptr_t)addr,
> +						iova, cur_pgsz);
> +				if (ret < 0) {
> +					/*
> +					 * ENODEV means VFIO is not initialized
> +					 * ENOTSUP means current IOMMU mode
> +					 * doesn't support mapping
> +					 * both cases are not an error
> +					 */
> +					if (rte_errno == ENOTSUP ||
> +							rte_errno == ENODEV)
> +						/* VFIO is unsupported, don't
> +						 * try again.
> +						 */
> +						vfio_supported = false;
> +					else
> +						/* this is an actual error */
> +						goto fail;
> +				}
> +			}
> +		}
> +
> +		break;
> +	}
> +	/* if we couldn't allocate anything */
> +	if (iovas == NULL)
> +		return -1;
> +
> +	param->addr = addr;
> +	param->len = mem_sz;
> +	param->pgsz = cur_pgsz;
> +	param->iova_table = iovas;
> +	param->iova_table_len = n_pages;
> +
> +	return 0;
> +fail:
> +	if (iovas)
> +		free(iovas);
> +	if (addr)
> +		munmap(addr, mem_sz);
> +
> +	return -1;
> +}
> +
> +static int
> +setup_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbuf_sz)
> +{
> +	struct extmem_param param;
> +	int ret;
> +
> +	/* create our heap */
> +	ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
> +	if (ret < 0) {
> +		printf("Cannot create heap\n");
> +		return -1;
> +	}
> +
> +	ret = create_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz, &param);
> +	if (ret < 0) {
> +		printf("Cannot create memory area\n");
> +		return -1;
> +	}
> +
> +	/* we now have a valid memory area, so add it to heap */
> +	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
> +			param.addr, param.len, param.iova_table,
> +			param.iova_table_len, param.pgsz);
> +
> +	/* not needed any more */
> +	free(param.iova_table);
> +
> +	if (ret < 0) {
> +		printf("Cannot add memory to heap\n");
> +		munmap(param.addr, param.len);
> +		return -1;
> +	}
> +
> +	printf("Allocated %zuMB of memory\n", param.len >> 20);
> +
> +	/* success */
> +	return 0;
> +}
> +
> +
> +/*
> + * The main function, which does initialization and calls the per-lcore
> + * functions.
> + */
> +int
> +main(int argc, char *argv[])
> +{
> +	struct rte_mempool *mbuf_pool;
> +	unsigned int nb_ports;
> +	int socket_id;
> +	uint16_t portid;
> +	uint32_t nb_mbufs_per_port, mbuf_sz;
> +
> +	/* Initialize the Environment Abstraction Layer (EAL). */
> +	int ret = rte_eal_init(argc, argv);
> +	if (ret < 0)
> +		rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
> +
> +	argc -= ret;
> +	argv += ret;
> +
> +	/* Check that there is an even number of ports to send/receive on. */
> +	nb_ports = rte_eth_dev_count_avail();
> +	if (nb_ports < 2 || (nb_ports & 1))
> +		rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
> +
> +	nb_mbufs_per_port = NUM_MBUFS;
> +	mbuf_sz = RTE_MBUF_DEFAULT_BUF_SIZE;
> +
> +	if (setup_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz) < 0)
> +		rte_exit(EXIT_FAILURE, "Error: cannot set up external memory\n");
> +
> +	/* retrieve socket ID for our heap */
> +	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
> +	if (socket_id < 0)
> +		rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n");
> +
> +	/* Creates a new mempool in memory to hold the mbufs. */
> +	mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL",
> +			nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0,
> +			mbuf_sz, socket_id);
> +
> +	if (mbuf_pool == NULL)
> +		rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
> +
> +	/* Initialize all ports. */
> +	RTE_ETH_FOREACH_DEV(portid)
> +		if (port_init(portid, mbuf_pool) != 0)
> +			rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
> +					portid);
> +
> +	if (rte_lcore_count() > 1)
> +		printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
> +
> +	/* Call lcore_main on the master core only. */
> +	lcore_main();
> +
> +	return 0;
> +}
> diff --git a/examples/external_mem/meson.build b/examples/external_mem/meson.build
> new file mode 100644
> index 000000000..17a363ad2
> --- /dev/null
> +++ b/examples/external_mem/meson.build
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2017 Intel Corporation
> +
> +# meson file, for building this example as part of a main DPDK build.
> +#
> +# To build this example as a standalone application with an already-installed
> +# DPDK instance, use 'make'
> +
> +allow_experimental_apis = true
> +sources = files(
> +	'extmem.c'
> +)
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app
  2018-09-20 22:47     ` Ananyev, Konstantin
@ 2018-09-21  9:03       ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-21  9:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko

On 20-Sep-18 11:47 PM, Ananyev, Konstantin wrote:
> 
> Hi Anatoly
> 
>>
>> Introduce an example application demonstrating the use of
>> external memory support. This is a simple application based on
>> skeleton app, but instead of using internal DPDK memory, it is
>> using externally allocated memory.
>>
>> The RX/TX and init path is a carbon-copy of skeleton app, with
>> no modifications whatseoever. The only difference is an additional
>> init stage to allocate memory and create a heap for it, and the
>> socket ID supplied to the mempool initialization function. The
>> memory used by this app is hugepage memory allocated anonymously.
>>
>> Anonymous hugepage memory will not be allocated in a NUMA-aware
>> fashion, so there is a chance of performance degradation when
>> using this app, but given that kernel usually gives hugepages on
>> local socket first, this should not be a problem in most cases.
> 
> Do we need a new sample app just for that?
> Couldn't it be added into testpmd, same, as we have now 'mp-anon'
> to use mempool over anonymous memory?
> Konstantin
> 
Hi Konstantin,

The reason i made a new sample app and not put it in testpmd is that i 
felt putting it in testpmd hurts discoverability of sample code. 
However, i can do either, or both?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-23 21:21       ` Thomas Monjalon
                         ` (22 more replies)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 01/20] mem: add length to memseg list Anatoly Burakov
                       ` (19 subsequent siblings)
  20 siblings, 23 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

Creating and destroying heaps is currently restricted to primary
processes, because we need to keep track of all socket ID's we've ever
used to prevent their reuse, and obviously different processes would
have kept different socket ID counters, and it isn't important enough
to put into shared memory. This means that secondary processes will
not be able to create new heaps. If this use case is important
enough, we can put the max socket ID into shared memory, or allow
socket ID reuse (which i do not think is a good idea because it has
the potential to make things harder to debug).

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (20):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 337 +++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  38 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  24 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   6 +-
 lib/librte_eal/common/include/rte_malloc.h    | 198 +++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 300 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 420 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 46 files changed, 1897 insertions(+), 137 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 01/20] mem: add length to memseg list
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
                       ` (18 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs, arybchenko

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 01/20] mem: add length to memseg list Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                       ` (17 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 133 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 03/20] malloc: index heaps using heap ID rather than NUMA node
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (2 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
                       ` (16 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs, arybchenko

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 +++++---
 7 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..1d1e35708 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +603,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +640,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +655,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +681,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +703,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +718,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +734,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +800,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +957,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +995,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (3 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 05/20] flow_classify: " Anatoly Burakov
                       ` (15 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 05/20] flow_classify: do not check for invalid socket ID
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (4 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 06/20] pipeline: " Anatoly Burakov
                       ` (14 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 06/20] pipeline: do not check for invalid socket ID
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (5 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 05/20] flow_classify: " Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 07/20] sched: " Anatoly Burakov
                       ` (13 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 07/20] sched: do not check for invalid socket ID
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (6 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 06/20] pipeline: " Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 08/20] malloc: add name to malloc heaps Anatoly Burakov
                       ` (12 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 08/20] malloc: add name to malloc heaps
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (7 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 07/20] sched: " Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
                       ` (11 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 15 ++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..2a5d2a381 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,20 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	/* assign names to default DPDK heaps */
+	for (i = 0; i < rte_socket_count(); i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+		char heap_name[RTE_HEAP_NAME_MAX_LEN];
+		int socket_id = rte_socket_id_by_idx(i);
+
+		snprintf(heap_name, sizeof(heap_name) - 1,
+				"socket_%i", socket_id);
+		strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+		heap->socket_id = socket_id;
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 09/20] malloc: add function to query socket ID of named heap
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (8 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 08/20] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 10/20] malloc: add function to check if socket is external Anatoly Burakov
                       ` (10 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 10/20] malloc: add function to check if socket is external
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (9 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-21 16:13     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 11/20] malloc: allow creating malloc heaps Anatoly Burakov
                       ` (9 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 11/20] malloc: allow creating malloc heaps
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (10 preceding siblings ...)
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 10/20] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 12/20] malloc: allow destroying heaps Anatoly Burakov
                       ` (8 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 19 ++++++++
 lib/librte_eal/common/malloc_heap.c        | 30 +++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 52 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 105 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2a5d2a381..1dd4ffcf9 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	static uint32_t next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 12/20] malloc: allow destroying heaps
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (11 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 11/20] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 13/20] malloc: allow adding memory to named heaps Anatoly Burakov
                       ` (7 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1dd4ffcf9..e98f720cb 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1045,6 +1045,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 13/20] malloc: allow adding memory to named heaps
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (12 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 12/20] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 14/20] malloc: allow removing memory from " Anatoly Burakov
                       ` (6 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index e98f720cb..2f6946f65 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 14/20] malloc: allow removing memory from named heaps
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (13 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 13/20] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 15/20] malloc: allow attaching to external memory chunks Anatoly Burakov
                       ` (5 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 2f6946f65..3ac3b06de 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1093,6 +1119,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 15/20] malloc: allow attaching to external memory chunks
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (14 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 14/20] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 16/20] malloc: allow detaching from external memory Anatoly Burakov
                       ` (4 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 48 +++++++++--
 lib/librte_eal/common/rte_malloc.c         | 93 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..ec59302de 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,21 +333,48 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
  * @note Heaps created via this call will automatically get assigned a unique
  *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on successful creation
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     EEXIST - heap by name of ``heap_name`` already exists
- *     ENOSPC - no more space in internal config to store a new heap
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     EEXIST          - heap by name of ``heap_name`` already exists
+ *     ENOSPC          - no more space in internal config to store a new heap
+ *     E_RTE_SECONDARY - attempted to create a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
@@ -357,16 +388,19 @@ rte_malloc_heap_create(const char *heap_name);
  * @note This function will return a failure result if not all memory segments
  *   were removed from the heap prior to its destruction
  *
+ * @note This function can only be called in primary process.
+ *
  * @param heap_name
  *   Name of the heap to create.
  *
  * @return
  *   - 0 on success
  *   - -1 in case of error, with rte_errno set to one of the following:
- *     EINVAL - ``heap_name`` was NULL, empty or too long
- *     ENOENT - heap by the name of ``heap_name`` was not found
- *     EPERM  - attempting to destroy reserved heap
- *     EBUSY  - heap still contains data
+ *     EINVAL          - ``heap_name`` was NULL, empty or too long
+ *     ENOENT          - heap by the name of ``heap_name`` was not found
+ *     EPERM           - attempting to destroy reserved heap
+ *     EBUSY           - heap still contains data
+ *     E_RTE_SECONDARY - attempted to destroy a heap in secondary process
  */
 int __rte_experimental
 rte_malloc_heap_destroy(const char *heap_name);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..01c112a46 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -425,6 +508,11 @@ rte_malloc_heap_create(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int i, ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
@@ -476,6 +564,11 @@ rte_malloc_heap_destroy(const char *heap_name)
 	struct malloc_heap *heap = NULL;
 	int ret;
 
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		rte_errno = E_RTE_SECONDARY;
+		return -1;
+	}
+
 	if (heap_name == NULL ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
 			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 16/20] malloc: allow detaching from external memory
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (15 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 15/20] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 17/20] test: add unit tests for external memory support Anatoly Burakov
                       ` (3 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 ++++++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 27 ++++++++++++++++++----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index ec59302de..6ef948ae8 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 01c112a46..04d554c2a 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -486,9 +490,10 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 17/20] test: add unit tests for external memory support
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (16 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 16/20] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 18/20] app/testpmd: add support for external memory Anatoly Burakov
                       ` (2 subsequent siblings)
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 18/20] app/testpmd: add support for external memory
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (17 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 17/20] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 19/20] doc: add external memory feature to the release notes Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 20/20] doc: add external memory feature to programmer's guide Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 337 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 381 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a0f934932..4789910b3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2413,6 +2413,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2421,12 +2438,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..b4016668c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon or xmem\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..7c5f7dd0a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -60,9 +61,18 @@
 #ifdef RTE_LIBRTE_LATENCY_STATS
 #include <rte_latencystats.h>
 #endif
+#include <rte_vfio.h>
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGE_SHIFT
+#define HUGE_SHIFT 26
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +98,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +541,255 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 32MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 32 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return log2 << HUGE_SHIFT;
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= MAP_HUGETLB | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int n_pages, cur_page, pgsz_idx;
+	size_t mem_sz, offset, cur_pgsz;
+	bool vfio_supported = true;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+
+		/* populate IOVA table */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+
+			/* touch the page before finding its IOVA */
+			*(volatile char *)cur = *(volatile char *)cur;
+
+			iova = (uintptr_t)rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+
+			if (vfio_supported) {
+				/* map memory for DMA */
+				ret = rte_vfio_dma_map((uintptr_t)cur,
+						iova, cur_pgsz);
+				if (ret < 0) {
+					/*
+					 * ENODEV means VFIO is not initialized
+					 * ENOTSUP means current IOMMU mode
+					 * doesn't support mapping
+					 * both cases are not an error
+					 */
+					if (rte_errno == ENOTSUP ||
+							rte_errno == ENODEV)
+						/* VFIO is unsupported, don't
+						 * try again.
+						 */
+						vfio_supported = false;
+					else
+						/* this is an actual error */
+						goto fail;
+				}
+			}
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param = {};
+	int socket_id, ret;
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +808,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 19/20] doc: add external memory feature to the release notes
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (18 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 18/20] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 20/20] doc: add external memory feature to programmer's guide Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 63bbb1b51..9a05c9980 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,11 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v4 20/20] doc: add external memory feature to programmer's guide
  2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                       ` (19 preceding siblings ...)
  2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 19/20] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-09-21 16:14     ` Anatoly Burakov
  20 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-21 16:14 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..37de8d63d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,44 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+  * If IOVA table is not specified, IOVA addresses will be assumed to be
+    unavailable
+  * Any DMA mappings for the external area are responsibility of the user
+  * Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+  * Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+  * Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
@ 2018-09-23 21:21       ` Thomas Monjalon
  2018-09-24  8:54         ` Burakov, Anatoly
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                         ` (21 subsequent siblings)
  22 siblings, 1 reply; 225+ messages in thread
From: Thomas Monjalon @ 2018-09-23 21:21 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, shreyansh.jain, shahafs, arybchenko

Hi Anatoly,

21/09/2018 18:13, Anatoly Burakov:
> This is a proposal to enable using externally allocated memory
> in DPDK.

About this change and previous ones, I think we may miss some
documentation about the usage and the internal design of the DPDK
memory allocation.
You already updated some doc recently:
	http://git.dpdk.org/dpdk/commit/?id=b31739328

This is what we have currently:
	http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#memory-segments-and-memory-zones-memzone
	http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#malloc
	http://doc.dpdk.org/guides/prog_guide/mempool_lib.html

This is probably a good time to check this doc again.
Do you think it deserves more explanations, or maybe some figures?

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK
  2018-09-23 21:21       ` Thomas Monjalon
@ 2018-09-24  8:54         ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-24  8:54 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, shreyansh.jain, shahafs, arybchenko

On 23-Sep-18 10:21 PM, Thomas Monjalon wrote:
> Hi Anatoly,
> 
> 21/09/2018 18:13, Anatoly Burakov:
>> This is a proposal to enable using externally allocated memory
>> in DPDK.
> 
> About this change and previous ones, I think we may miss some
> documentation about the usage and the internal design of the DPDK
> memory allocation.
> You already updated some doc recently:
> 	http://git.dpdk.org/dpdk/commit/?id=b31739328
> 
> This is what we have currently:
> 	http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#memory-segments-and-memory-zones-memzone
> 	http://doc.dpdk.org/guides/prog_guide/env_abstraction_layer.html#malloc
> 	http://doc.dpdk.org/guides/prog_guide/mempool_lib.html
> 
> This is probably a good time to check this doc again.
> Do you think it deserves more explanations, or maybe some figures?
> 

Maybe this could be split into two sections - explanation of user-facing 
API, and explanation of its inner workings. However, I don't want for 
DPDK documentation to become my personal soapbox, so i'm open to 
suggestions on what is missing and how to organize the memory docs better :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
  2018-09-23 21:21       ` Thomas Monjalon
@ 2018-09-26 11:21       ` Anatoly Burakov
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                           ` (21 more replies)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 01/21] mem: add length to memseg list Anatoly Burakov
                         ` (20 subsequent siblings)
  22 siblings, 22 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:21 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 305 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  28 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  14 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |   8 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 316 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 47 files changed, 1913 insertions(+), 139 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 01/21] mem: add length to memseg list
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
  2018-09-23 21:21       ` Thomas Monjalon
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                         ` (19 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs, arybchenko

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (2 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                         ` (18 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,10 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+       a new flag indicating whether the memseg list refers to external memory.
+
 Removed Items
 -------------
 
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (3 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                         ` (17 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs, arybchenko

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 +++++---
 7 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..1d1e35708 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +603,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +640,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +655,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +681,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +703,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +718,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +734,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +800,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +957,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +995,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (4 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 05/21] flow_classify: " Anatoly Burakov
                         ` (16 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 05/21] flow_classify: do not check for invalid socket ID
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (5 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 06/21] pipeline: " Anatoly Burakov
                         ` (15 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 06/21] pipeline: do not check for invalid socket ID
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (6 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 05/21] flow_classify: " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 07/21] sched: " Anatoly Burakov
                         ` (14 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 07/21] sched: do not check for invalid socket ID
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (7 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 06/21] pipeline: " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
                         ` (13 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (8 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 07/21] sched: " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
                         ` (12 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  1 +
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
 * eal: EAL library ABI version was changed due to previously announced work on
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
+       Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 09/21] malloc: add function to query socket ID of named heap
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (9 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 10/21] malloc: add function to check if socket is external Anatoly Burakov
                         ` (11 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 10/21] malloc: add function to check if socket is external
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (10 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps Anatoly Burakov
                         ` (10 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (11 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 10/21] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 12/21] malloc: allow destroying heaps Anatoly Burakov
                         ` (9 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
        Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+       Structure ``rte_eal_memconfig`` has been extended to contain next socket
+       ID for externally allocated memory segments.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 12/21] malloc: allow destroying heaps
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (12 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
                         ` (8 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 987b83fb8..b51390210 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1049,6 +1049,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 13/21] malloc: allow adding memory to named heaps
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (13 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 12/21] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 14/21] malloc: allow removing memory from " Anatoly Burakov
                         ` (7 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b51390210..36bfd53d3 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 14/21] malloc: allow removing memory from named heaps
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (14 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
                         ` (6 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 36bfd53d3..c52d84419 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1093,6 +1119,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 15/21] malloc: allow attaching to external memory chunks
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (15 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 14/21] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 16/21] malloc: allow detaching from external memory Anatoly Burakov
                         ` (5 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..793f9473a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..5078235b1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 16/21] malloc: allow detaching from external memory
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (16 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 17/21] malloc: enable event callbacks for " Anatoly Burakov
                         ` (4 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 31 +++++++++++++++++-----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 793f9473a..7249e6aae 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5078235b1..72e42b337 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 		ret = -1;
 		goto unlock;
 	}
-	/* we shouldn't be able to attach to internal heaps */
+	/* we shouldn't be able to sync to internal heaps */
 	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
 		rte_errno = EPERM;
 		ret = -1;
 		goto unlock;
 	}
 
-	/* find corresponding memseg list to attach to */
+	/* find corresponding memseg list to sync to */
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 17/21] malloc: enable event callbacks for external memory
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (17 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 16/21] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 18/21] test: add unit tests for external memory support Anatoly Burakov
                         ` (3 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Maxime Coquelin, Tiwei Bie,
	Zhihong Wang, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas, shahafs, arybchenko

When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++++
 .../net/virtio/virtio_user/virtio_user_dev.c  |  8 ++++++
 lib/librte_eal/common/malloc_heap.c           |  7 +++++
 lib/librte_eal/common/rte_malloc.c            | 27 ++++++++++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 10 +++++--
 5 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 2e9244fb7..001852217 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len,
 					"alloc" : "dealloc",
 				va, virt_addr, iova_addr, map_len);
 
+		/* iova_addr may be set to RTE_BAD_IOVA */
+		if (iova_addr == RTE_BAD_IOVA) {
+			DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n");
+			cur_len += map_len;
+			continue;
+		}
+
 		if (type == RTE_MEM_EVENT_ALLOC)
 			ret = fslmc_map_dma(virt_addr, iova_addr, map_len);
 		else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 7df600b02..de813d0df 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -13,6 +13,8 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include <rte_eal_memconfig.h>
+
 #include "vhost.h"
 #include "virtio_user_dev.h"
 #include "../virtio_ethdev.h"
@@ -282,8 +284,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused,
 						 void *arg)
 {
 	struct virtio_user_dev *dev = arg;
+	struct rte_memseg_list *msl;
 	uint16_t i;
 
+	/* ignore externally allocated memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl->external)
+		return;
+
 	pthread_mutex_lock(&dev->mutex);
 
 	if (dev->started == false)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c52d84419..5883714ba 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1027,6 +1027,9 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	msl = elem->msl;
 
+	/* notify all subscribers that a memory area is going to be removed */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
+
 	/* this element can be removed */
 	malloc_elem_free_list_remove(elem);
 	malloc_elem_hide_region(elem, elem, len);
@@ -1116,6 +1119,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
 			heap->name, va_addr);
 
+	/* notify all subscribers that a new memory area has been added */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+			va_addr, seg_len);
+
 	return 0;
 }
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72e42b337..2c19c2f87 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -25,6 +25,7 @@
 #include <rte_malloc.h>
 #include "malloc_elem.h"
 #include "malloc_heap.h"
+#include "eal_memalloc.h"
 
 
 /* Free the memory space back to heap */
@@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		if (wa->attach)
+		if (wa->attach) {
 			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		else
+		} else {
+			/* notify all subscribers that a memory area is about to
+			 * be removed
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+					msl->base_va, msl->len);
 			ret = rte_fbarray_detach(&found_msl->memseg_arr);
+		}
 
-		if (ret < 0)
+		if (ret < 0) {
 			wa->result = -rte_errno;
-		else
+		} else {
+			/* notify all subscribers that a new memory area was
+			 * added
+			 */
+			if (wa->attach)
+				eal_memalloc_mem_event_notify(
+						RTE_MEM_EVENT_ALLOC,
+						msl->base_va, msl->len);
 			wa->result = 0;
+		}
 		return 1;
 	}
 	return 0;
@@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 		rte_errno = -wa.result;
 		ret = -1;
 	} else {
+		/* notify all subscribers that a new memory area was added */
+		if (attach)
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
 		ret = 0;
 	}
 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index fddbc3b54..d7268e4ce 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	msl = rte_mem_virt2memseg_list(addr);
 
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
@@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
 	while (cur_len < len) {
+		/* some memory segments may have invalid IOVA */
+		if (ms->iova == RTE_BAD_IOVA) {
+			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
+					ms->addr);
+			goto next;
+		}
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 1);
 		else
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 0);
-
+next:
 		cur_len += ms->len;
 		++ms;
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 18/21] test: add unit tests for external memory support
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (18 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 17/21] malloc: enable event callbacks for " Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 19/21] app/testpmd: add support for external memory Anatoly Burakov
                         ` (2 subsequent siblings)
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 19/21] app/testpmd: add support for external memory
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (19 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 18/21] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 20/21] doc: add external memory feature to the release notes Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 305 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 349 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a0f934932..4789910b3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2413,6 +2413,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2421,12 +2438,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..b4016668c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon or xmem\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..a6a2dbdeb 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -63,6 +64,22 @@
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGETLB
+/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
+#define HUGE_FLAG (0x40000)
+#else
+#define HUGE_FLAG MAP_HUGETLB
+#endif
+
+#ifndef MAP_HUGE_SHIFT
+/* older kernels (or FreeBSD) will not have this define */
+#define HUGE_SHIFT (26)
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +548,216 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 32MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 32 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return (log2 << HUGE_SHIFT);
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= HUGE_FLAG | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int n_pages, pgsz_idx;
+	size_t mem_sz, cur_pgsz;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param = {};
+	int socket_id, ret;
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* when using VFIO, memory is automatically mapped for DMA by EAL */
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +776,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 20/21] doc: add external memory feature to the release notes
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (20 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  22 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5065ec1af..4248ff4f9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,11 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide
  2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
                         ` (21 preceding siblings ...)
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 20/21] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-09-26 11:22       ` Anatoly Burakov
  2018-09-26 15:19         ` Kovacevic, Marko
  22 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..f2191b695 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,43 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+  * If IOVA table is not specified, IOVA addresses will be assumed to be
+    unavailable, and DMA mappings will not be performed
+  * Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+  * Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+  * Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide
  2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
@ 2018-09-26 15:19         ` Kovacevic, Marko
  2018-09-26 16:00           ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Kovacevic, Marko @ 2018-09-26 15:19 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Mcnamara, John, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, Wiles, Keith,
	Richardson, Bruce, thomas, shreyansh.jain, shahafs, arybchenko

> Add a short chapter on usage of external memory in DPDK to the
> Programmer's Guide.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)

> +The expected workflow is as follows:
> +
> +* Get a pointer to memory area
> +* Create a named heap
> +* Add memory area(s) to the heap
> +  * If IOVA table is not specified, IOVA addresses will be assumed to be
> +    unavailable, and DMA mappings will not be performed
> +  * Other processes must attach to the memory area before they can use
> +it
> +* Get socket ID used for the heap
> +* Use normal DPDK allocation procedures, using supplied socket ID
> +* If memory area is no longer needed, it can be removed from the heap
> +  * Other processes must detach from this memory area before it can be
> +removed
> +* If heap is no longer needed, remove it
> +  * Socket ID will become invalid and will not be reused


Hi Anatoly,

Im getting an error when doing 

make-doc-guides-html

/dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:241: WARNING: Unexpected indentation.
/dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:242: WARNING: Block quote ends without a blank line; unexpected unindent.

This is due to the indentation of your inner bullet points instead of two spaces put 4
For all four that you have.

Thanks,
Marko K

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide
  2018-09-26 15:19         ` Kovacevic, Marko
@ 2018-09-26 16:00           ` Burakov, Anatoly
  2018-09-26 16:17             ` Kovacevic, Marko
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-26 16:00 UTC (permalink / raw)
  To: Kovacevic, Marko, dev
  Cc: Mcnamara, John, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, Wiles, Keith,
	Richardson, Bruce, thomas, shreyansh.jain, shahafs, arybchenko

On 26-Sep-18 4:19 PM, Kovacevic, Marko wrote:
>> Add a short chapter on usage of external memory in DPDK to the
>> Programmer's Guide.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
>>   1 file changed, 37 insertions(+)
> 
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Create a named heap
>> +* Add memory area(s) to the heap
>> +  * If IOVA table is not specified, IOVA addresses will be assumed to be
>> +    unavailable, and DMA mappings will not be performed
>> +  * Other processes must attach to the memory area before they can use
>> +it
>> +* Get socket ID used for the heap
>> +* Use normal DPDK allocation procedures, using supplied socket ID
>> +* If memory area is no longer needed, it can be removed from the heap
>> +  * Other processes must detach from this memory area before it can be
>> +removed
>> +* If heap is no longer needed, remove it
>> +  * Socket ID will become invalid and will not be reused
> 
> 
> Hi Anatoly,
> 
> Im getting an error when doing
> 
> make-doc-guides-html
> 
> /dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:241: WARNING: Unexpected indentation.
> /dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:242: WARNING: Block quote ends without a blank line; unexpected unindent.
> 
> This is due to the indentation of your inner bullet points instead of two spaces put 4
> For all four that you have.
> 
> Thanks,
> Marko K
> 
Hi Marko,

Those are supposed to be sub-bullet points, i.e. 2nd level. Does it have 
to be six?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide
  2018-09-26 16:00           ` Burakov, Anatoly
@ 2018-09-26 16:17             ` Kovacevic, Marko
  0 siblings, 0 replies; 225+ messages in thread
From: Kovacevic, Marko @ 2018-09-26 16:17 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Mcnamara, John, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, Wiles, Keith,
	Richardson, Bruce, thomas, shreyansh.jain, shahafs, arybchenko

> > Hi Anatoly,
> >
> > Im getting an error when doing
> >
> > make-doc-guides-html
> >
> > /dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:241: WARNING:
> Unexpected indentation.
> > /dpdk/doc/guides/prog_guide/env_abstraction_layer.rst:242: WARNING:
> Block quote ends without a blank line; unexpected unindent.
> >
> > This is due to the indentation of your inner bullet points instead of
> > two spaces put 4 For all four that you have.
> >
> > Thanks,
> > Marko K
> >
> Hi Marko,
> 
> Those are supposed to be sub-bullet points, i.e. 2nd level. Does it have to be
> six?
> 
> --
> Thanks,
> Anatoly

Yeah I understand that it is supposed to be sub-bullets but the output is skewed when you only have 
2 spaces for them instead of having 2 have 4 not 6


^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 00/21] Support externally allocated memory in DPDK
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
@ 2018-09-27 10:40         ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                             ` (21 more replies)
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list Anatoly Burakov
                           ` (20 subsequent siblings)
  21 siblings, 22 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 305 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  28 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  14 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |   8 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 316 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 47 files changed, 1913 insertions(+), 139 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-09-27 10:40         ` Anatoly Burakov
  2018-09-27 11:05           ` Shreyansh Jain
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                           ` (19 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-09-27 10:40         ` Anatoly Burakov
  2018-09-27 11:03           ` Shreyansh Jain
  2018-09-29  0:09           ` Yongseok Koh
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                           ` (18 subsequent siblings)
  21 siblings, 2 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas, alejandro.lucero

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,10 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+       a new flag indicating whether the memseg list refers to external memory.
+
 Removed Items
 -------------
 
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (2 preceding siblings ...)
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 13:01           ` Alejandro Lucero
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                           ` (17 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |  1 +
 config/rte_config.h                           |  1 +
 .../common/include/rte_eal_memconfig.h        |  4 +-
 .../common/include/rte_malloc_heap.h          |  1 +
 lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
 lib/librte_eal/common/malloc_heap.h           |  3 +
 lib/librte_eal/common/rte_malloc.c            | 41 +++++---
 7 files changed, 106 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..1d1e35708 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +603,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +640,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +655,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +681,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +703,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +718,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +734,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +800,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +957,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +995,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (3 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 13:14           ` Alejandro Lucero
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 05/21] flow_classify: " Anatoly Burakov
                           ` (16 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 05/21] flow_classify: do not check for invalid socket ID
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (4 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 16:14           ` Iremonger, Bernard
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 06/21] pipeline: " Anatoly Burakov
                           ` (15 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 06/21] pipeline: do not check for invalid socket ID
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (5 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 05/21] flow_classify: " Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 07/21] sched: " Anatoly Burakov
                           ` (14 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 07/21] sched: do not check for invalid socket ID
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (6 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 06/21] pipeline: " Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
                           ` (13 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (7 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 07/21] sched: " Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
                           ` (12 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  1 +
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
 * eal: EAL library ABI version was changed due to previously announced work on
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
+       Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 09/21] malloc: add function to query socket ID of named heap
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (8 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 10/21] malloc: add function to check if socket is external Anatoly Burakov
                           ` (11 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 10/21] malloc: add function to check if socket is external
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (9 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps Anatoly Burakov
                           ` (10 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, alejandro.lucero

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (10 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 10/21] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 12/21] malloc: allow destroying heaps Anatoly Burakov
                           ` (9 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
        Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+       Structure ``rte_eal_memconfig`` has been extended to contain next socket
+       ID for externally allocated memory segments.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 12/21] malloc: allow destroying heaps
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (11 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
                           ` (8 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 987b83fb8..b51390210 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1049,6 +1049,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 13/21] malloc: allow adding memory to named heaps
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (12 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 12/21] malloc: allow destroying heaps Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 14/21] malloc: allow removing memory from " Anatoly Burakov
                           ` (7 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b51390210..36bfd53d3 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 14/21] malloc: allow removing memory from named heaps
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (13 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
                           ` (6 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 36bfd53d3..c52d84419 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1019,6 +1019,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1093,6 +1119,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 15/21] malloc: allow attaching to external memory chunks
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (14 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 14/21] malloc: allow removing memory from " Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 16/21] malloc: allow detaching from external memory Anatoly Burakov
                           ` (5 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..793f9473a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..5078235b1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 16/21] malloc: allow detaching from external memory
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (15 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 17/21] malloc: enable event callbacks for " Anatoly Burakov
                           ` (4 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 31 +++++++++++++++++-----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 793f9473a..7249e6aae 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5078235b1..72e42b337 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 		ret = -1;
 		goto unlock;
 	}
-	/* we shouldn't be able to attach to internal heaps */
+	/* we shouldn't be able to sync to internal heaps */
 	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
 		rte_errno = EPERM;
 		ret = -1;
 		goto unlock;
 	}
 
-	/* find corresponding memseg list to attach to */
+	/* find corresponding memseg list to sync to */
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 17/21] malloc: enable event callbacks for external memory
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (16 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 16/21] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 18/21] test: add unit tests for external memory support Anatoly Burakov
                           ` (3 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Maxime Coquelin, Tiwei Bie,
	Zhihong Wang, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas, shahafs, arybchenko, alejandro.lucero

When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++++
 .../net/virtio/virtio_user/virtio_user_dev.c  |  8 ++++++
 lib/librte_eal/common/malloc_heap.c           |  7 +++++
 lib/librte_eal/common/rte_malloc.c            | 27 ++++++++++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 10 +++++--
 5 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 2e9244fb7..001852217 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len,
 					"alloc" : "dealloc",
 				va, virt_addr, iova_addr, map_len);
 
+		/* iova_addr may be set to RTE_BAD_IOVA */
+		if (iova_addr == RTE_BAD_IOVA) {
+			DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n");
+			cur_len += map_len;
+			continue;
+		}
+
 		if (type == RTE_MEM_EVENT_ALLOC)
 			ret = fslmc_map_dma(virt_addr, iova_addr, map_len);
 		else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 7df600b02..de813d0df 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -13,6 +13,8 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include <rte_eal_memconfig.h>
+
 #include "vhost.h"
 #include "virtio_user_dev.h"
 #include "../virtio_ethdev.h"
@@ -282,8 +284,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused,
 						 void *arg)
 {
 	struct virtio_user_dev *dev = arg;
+	struct rte_memseg_list *msl;
 	uint16_t i;
 
+	/* ignore externally allocated memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl->external)
+		return;
+
 	pthread_mutex_lock(&dev->mutex);
 
 	if (dev->started == false)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c52d84419..5883714ba 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1027,6 +1027,9 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	msl = elem->msl;
 
+	/* notify all subscribers that a memory area is going to be removed */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
+
 	/* this element can be removed */
 	malloc_elem_free_list_remove(elem);
 	malloc_elem_hide_region(elem, elem, len);
@@ -1116,6 +1119,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
 			heap->name, va_addr);
 
+	/* notify all subscribers that a new memory area has been added */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+			va_addr, seg_len);
+
 	return 0;
 }
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72e42b337..2c19c2f87 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -25,6 +25,7 @@
 #include <rte_malloc.h>
 #include "malloc_elem.h"
 #include "malloc_heap.h"
+#include "eal_memalloc.h"
 
 
 /* Free the memory space back to heap */
@@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		if (wa->attach)
+		if (wa->attach) {
 			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		else
+		} else {
+			/* notify all subscribers that a memory area is about to
+			 * be removed
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+					msl->base_va, msl->len);
 			ret = rte_fbarray_detach(&found_msl->memseg_arr);
+		}
 
-		if (ret < 0)
+		if (ret < 0) {
 			wa->result = -rte_errno;
-		else
+		} else {
+			/* notify all subscribers that a new memory area was
+			 * added
+			 */
+			if (wa->attach)
+				eal_memalloc_mem_event_notify(
+						RTE_MEM_EVENT_ALLOC,
+						msl->base_va, msl->len);
 			wa->result = 0;
+		}
 		return 1;
 	}
 	return 0;
@@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 		rte_errno = -wa.result;
 		ret = -1;
 	} else {
+		/* notify all subscribers that a new memory area was added */
+		if (attach)
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
 		ret = 0;
 	}
 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index fddbc3b54..d7268e4ce 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	msl = rte_mem_virt2memseg_list(addr);
 
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
@@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
 	while (cur_len < len) {
+		/* some memory segments may have invalid IOVA */
+		if (ms->iova == RTE_BAD_IOVA) {
+			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
+					ms->addr);
+			goto next;
+		}
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 1);
 		else
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 0);
-
+next:
 		cur_len += ms->len;
 		++ms;
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 18/21] test: add unit tests for external memory support
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (17 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 17/21] malloc: enable event callbacks for " Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 19/21] app/testpmd: add support for external memory Anatoly Burakov
                           ` (2 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 19/21] app/testpmd: add support for external memory
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (18 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 18/21] test: add unit tests for external memory support Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 20/21] doc: add external memory feature to the release notes Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 305 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 349 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a0f934932..4789910b3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2413,6 +2413,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2421,12 +2438,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..b4016668c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon or xmem\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..a6a2dbdeb 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -63,6 +64,22 @@
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGETLB
+/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
+#define HUGE_FLAG (0x40000)
+#else
+#define HUGE_FLAG MAP_HUGETLB
+#endif
+
+#ifndef MAP_HUGE_SHIFT
+/* older kernels (or FreeBSD) will not have this define */
+#define HUGE_SHIFT (26)
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +548,216 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 32MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 32 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return (log2 << HUGE_SHIFT);
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= HUGE_FLAG | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int n_pages, pgsz_idx;
+	size_t mem_sz, cur_pgsz;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param = {};
+	int socket_id, ret;
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* when using VFIO, memory is automatically mapped for DMA by EAL */
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +776,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 20/21] doc: add external memory feature to the release notes
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (19 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5065ec1af..4248ff4f9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,11 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v6 21/21] doc: add external memory feature to programmer's guide
  2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
                           ` (20 preceding siblings ...)
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 20/21] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-09-27 10:41         ` Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..00ce64ceb 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,43 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable, and DMA mappings will not be performed
+    - Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+    - Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+    - Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 11:03           ` Shreyansh Jain
  2018-09-27 11:08             ` Burakov, Anatoly
  2018-09-29  0:09           ` Yongseok Koh
  1 sibling, 1 reply; 225+ messages in thread
From: Shreyansh Jain @ 2018-09-27 11:03 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
> 
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
> 
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
> 
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> 

Specifically for bus/fslmc perspective and generically for others:

Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-09-27 11:05           ` Shreyansh Jain
  0 siblings, 0 replies; 225+ messages in thread
From: Shreyansh Jain @ 2018-09-27 11:05 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shahafs, arybchenko, alejandro.lucero

On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
> Previously, to calculate length of memory area covered by a memseg
> list, we would've needed to multiply page size by length of fbarray
> backing that memseg list. This is not obvious and unnecessarily
> low level, so store length in the memseg list itself.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>   drivers/bus/pci/linux/pci.c                       | 2 +-
>   lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
>   lib/librte_eal/common/eal_common_memory.c         | 5 ++---
>   lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
>   lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
>   lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
>   6 files changed, 11 insertions(+), 6 deletions(-)
> 

Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:03           ` Shreyansh Jain
@ 2018-09-27 11:08             ` Burakov, Anatoly
  2018-09-27 11:12               ` Shreyansh Jain
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-27 11:08 UTC (permalink / raw)
  To: Shreyansh Jain, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes. This also breaks a few
>> internal assumptions about memory contiguousness, so adjust
>> malloc code in a few places.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>
> 
> Specifically for bus/fslmc perspective and generically for others:
> 
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> 
> 

Actually, this patch may need some further adjustment, since it makes 
assumption about not wanting to map external memory for DMA.

Specifically - there's an fslmc dma map function that now skips external 
memory segments. Are you sure that's how it's supposed to be?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:08             ` Burakov, Anatoly
@ 2018-09-27 11:12               ` Shreyansh Jain
  2018-09-27 11:29                 ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Shreyansh Jain @ 2018-09-27 11:12 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>> When we allocate and use DPDK memory, we need to be able to
>>> differentiate between DPDK hugepage segments and segments that
>>> were made part of DPDK but are externally allocated. Add such
>>> a property to memseg lists.
>>>
>>> This breaks the ABI, so bump the EAL library ABI version and
>>> document the change in release notes. This also breaks a few
>>> internal assumptions about memory contiguousness, so adjust
>>> malloc code in a few places.
>>>
>>> All current calls for memseg walk functions were adjusted to
>>> ignore external segments where it made sense.
>>>
>>> Mempools is a special case, because we may be asked to allocate
>>> a mempool on a specific socket, and we need to ignore all page
>>> sizes on other heaps or other sockets. Previously, this
>>> assumption of knowing all page sizes was not a problem, but it
>>> will be now, so we have to match socket ID with page size when
>>> calculating minimum page size for a mempool.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> ---
>>>
>>
>> Specifically for bus/fslmc perspective and generically for others:
>>
>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>>
> 
> Actually, this patch may need some further adjustment, since it makes 
> assumption about not wanting to map external memory for DMA.
> 
> Specifically - there's an fslmc dma map function that now skips external 
> memory segments. Are you sure that's how it's supposed to be?
> 

I thought over that.
For now yes. If we need to map external memory, and there is an event 
that would be called back, it should be handled separately. So, for 
example, a PMD level API to handle such requests from applications.

The point is that how the external memory is handled is use-case 
specific - the need to have its events reported back is definitely 
there, but its handling is still a grey area.

Once the patches make their way in, I can always come back and tune that.

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:12               ` Shreyansh Jain
@ 2018-09-27 11:29                 ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-27 11:29 UTC (permalink / raw)
  To: Shreyansh Jain, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On 27-Sep-18 12:12 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
>> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>>> When we allocate and use DPDK memory, we need to be able to
>>>> differentiate between DPDK hugepage segments and segments that
>>>> were made part of DPDK but are externally allocated. Add such
>>>> a property to memseg lists.
>>>>
>>>> This breaks the ABI, so bump the EAL library ABI version and
>>>> document the change in release notes. This also breaks a few
>>>> internal assumptions about memory contiguousness, so adjust
>>>> malloc code in a few places.
>>>>
>>>> All current calls for memseg walk functions were adjusted to
>>>> ignore external segments where it made sense.
>>>>
>>>> Mempools is a special case, because we may be asked to allocate
>>>> a mempool on a specific socket, and we need to ignore all page
>>>> sizes on other heaps or other sockets. Previously, this
>>>> assumption of knowing all page sizes was not a problem, but it
>>>> will be now, so we have to match socket ID with page size when
>>>> calculating minimum page size for a mempool.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>
>>>
>>> Specifically for bus/fslmc perspective and generically for others:
>>>
>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>>
>>>
>>
>> Actually, this patch may need some further adjustment, since it makes 
>> assumption about not wanting to map external memory for DMA.
>>
>> Specifically - there's an fslmc dma map function that now skips 
>> external memory segments. Are you sure that's how it's supposed to be?
>>
> 
> I thought over that.
> For now yes. If we need to map external memory, and there is an event 
> that would be called back, it should be handled separately. So, for 
> example, a PMD level API to handle such requests from applications.

Well, technically such an event is already available, now that external 
memory allocations trigger mem events :)

> 
> The point is that how the external memory is handled is use-case 
> specific - the need to have its events reported back is definitely 
> there, but its handling is still a grey area.
> 
> Once the patches make their way in, I can always come back and tune that.
> 

OK, fair enough.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-09-27 13:01           ` Alejandro Lucero
  2018-09-27 13:18             ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Alejandro Lucero @ 2018-09-27 13:01 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Shreyansh Jain, Shahaf Shuler,
	Andrew Rybchenko

On Thu, Sep 27, 2018 at 11:47 AM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list. Heap ID for
> external heaps will be order of their creation.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  config/common_base                            |  1 +
>  config/rte_config.h                           |  1 +
>  .../common/include/rte_eal_memconfig.h        |  4 +-
>  .../common/include/rte_malloc_heap.h          |  1 +
>  lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
>  lib/librte_eal/common/malloc_heap.h           |  3 +
>  lib/librte_eal/common/rte_malloc.c            | 41 +++++---
>  7 files changed, 106 insertions(+), 43 deletions(-)
>
> diff --git a/config/common_base b/config/common_base
> index 155c7d40e..b52770b27 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>  CONFIG_RTE_LIBRTE_EAL=y
>  CONFIG_RTE_MAX_LCORE=128
>  CONFIG_RTE_MAX_NUMA_NODES=8
> +CONFIG_RTE_MAX_HEAPS=32
>  CONFIG_RTE_MAX_MEMSEG_LISTS=64
>  # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
>  # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is
> smaller
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 567051b9c..5dd2ac1ad 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -24,6 +24,7 @@
>  #define RTE_BUILD_SHARED_LIB
>
>  /* EAL defines */
> +#define RTE_MAX_HEAPS 32
>  #define RTE_MAX_MEMSEG_LISTS 128
>  #define RTE_MAX_MEMSEG_PER_LIST 8192
>  #define RTE_MAX_MEM_MB_PER_LIST 32768
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index 6baa6854f..d7920a4e0 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -72,8 +72,8 @@ struct rte_mem_config {
>
>         struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for
> objects */
>
> -       /* Heaps of Malloc per socket */
> -       struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
> +       /* Heaps of Malloc */
> +       struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
>
>         /* address of mem_config in primary process. used to map shared
> config into
>          * exact same address the primary process maps it.
> diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h
> b/lib/librte_eal/common/include/rte_malloc_heap.h
> index d43fa9097..e7ac32d42 100644
> --- a/lib/librte_eal/common/include/rte_malloc_heap.h
> +++ b/lib/librte_eal/common/include/rte_malloc_heap.h
> @@ -27,6 +27,7 @@ struct malloc_heap {
>
>         unsigned alloc_count;
>         size_t total_size;
> +       unsigned int socket_id;
>  } __rte_cache_aligned;
>
>  #endif /* _RTE_MALLOC_HEAP_H_ */
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 3c8e2063b..1d1e35708 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
>         return check_flag & flags;
>  }
>
> +int
> +malloc_socket_to_heap_id(unsigned int socket_id)
> +{
> +       struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> +       int i;
> +
> +       for (i = 0; i < RTE_MAX_HEAPS; i++) {
> +               struct malloc_heap *heap = &mcfg->malloc_heaps[i];
> +
> +               if (heap->socket_id == socket_id)
> +                       return i;
> +       }
> +       return -1;
> +}
> +
>  /*
>   * Expand the heap with a memory area.
>   */
> @@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
>         struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
>         struct rte_memseg_list *found_msl;
>         struct malloc_heap *heap;
> -       int msl_idx;
> +       int msl_idx, heap_idx;
>
>         if (msl->external)
>                 return 0;
>
> -       heap = &mcfg->malloc_heaps[msl->socket_id];
> +       heap_idx = malloc_socket_to_heap_id(msl->socket_id);
>

malloc_socket_to_heap_id can return -1 so it requires to handle that
possibility.


> +       heap = &mcfg->malloc_heaps[heap_idx];
>
>         /* msl is const, so find it */
>         msl_idx = msl - mcfg->memsegs;
> @@ -111,6 +127,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
>         malloc_heap_add_memory(heap, found_msl, ms->addr, len);
>
>         heap->total_size += len;
> +       heap->socket_id = msl->socket_id;
>
>         RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
>                         msl->socket_id);
> @@ -561,12 +578,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap,
> size_t size, int socket,
>
>  /* this will try lower page sizes first */
>  static void *
> -heap_alloc_on_socket(const char *type, size_t size, int socket,
> -               unsigned int flags, size_t align, size_t bound, bool
> contig)
> +malloc_heap_alloc_on_heap_id(const char *type, size_t size,
> +               unsigned int heap_id, unsigned int flags, size_t align,
> +               size_t bound, bool contig)
>  {
>         struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> -       struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
> +       struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
>         unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
> +       int socket_id;
>         void *ret;
>
>         rte_spinlock_lock(&(heap->lock));
> @@ -584,12 +603,28 @@ heap_alloc_on_socket(const char *type, size_t size,
> int socket,
>          * we may still be able to allocate memory from appropriate page
> sizes,
>          * we just need to request more memory first.
>          */
> +
> +       socket_id = rte_socket_id_by_idx(heap_id);
> +       /*
> +        * if socket ID is negative, we cannot find a socket ID for this
> heap -
> +        * which means it's an external heap. those can have unexpected
> page
> +        * sizes, so if the user asked to allocate from there - assume user
> +        * knows what they're doing, and allow allocating from there with
> any
> +        * page size flags.
> +        */
> +       if (socket_id < 0)
> +               size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
> +
>         ret = heap_alloc(heap, type, size, size_flags, align, bound,
> contig);
>         if (ret != NULL)
>                 goto alloc_unlock;
>
> -       if (!alloc_more_mem_on_socket(heap, size, socket, flags, align,
> bound,
> -                       contig)) {
> +       /* if socket ID is invalid, this is an external heap */
> +       if (socket_id < 0)
> +               goto alloc_unlock;
> +
> +       if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
> +                       bound, contig)) {
>                 ret = heap_alloc(heap, type, size, flags, align, bound,
> contig);
>
>                 /* this should have succeeded */
> @@ -605,7 +640,7 @@ void *
>  malloc_heap_alloc(const char *type, size_t size, int socket_arg,
>                 unsigned int flags, size_t align, size_t bound, bool
> contig)
>  {
> -       int socket, i, cur_socket;
> +       int socket, heap_id, i;
>         void *ret;
>
>         /* return NULL if size is 0 or alignment is not power-of-2 */
> @@ -620,22 +655,25 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
>         else
>                 socket = socket_arg;
>
> -       /* Check socket parameter */
> -       if (socket >= RTE_MAX_NUMA_NODES)
> +       /* turn socket ID into heap ID */
> +       heap_id = malloc_socket_to_heap_id(socket);
> +       /* if heap id is negative, socket ID was invalid */
> +       if (heap_id < 0)
>                 return NULL;
>
> -       ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
> -                       contig);
> +       ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags,
> align,
> +                       bound, contig);
>         if (ret != NULL || socket_arg != SOCKET_ID_ANY)
>                 return ret;
>
> -       /* try other heaps */
> +       /* try other heaps. we are only iterating through native DPDK
> sockets,
> +        * so external heaps won't be included.
> +        */
>         for (i = 0; i < (int) rte_socket_count(); i++) {
> -               cur_socket = rte_socket_id_by_idx(i);
> -               if (cur_socket == socket)
> +               if (i == heap_id)
>                         continue;
> -               ret = heap_alloc_on_socket(type, size, cur_socket, flags,
> -                               align, bound, contig);
> +               ret = malloc_heap_alloc_on_heap_id(type, size, i, flags,
> align,
> +                               bound, contig);
>                 if (ret != NULL)
>                         return ret;
>         }
> @@ -643,11 +681,11 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
>  }
>
>  static void *
> -heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int
> flags,
> -               size_t align, bool contig)
> +heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
> +               unsigned int flags, size_t align, bool contig)
>  {
>         struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> -       struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
> +       struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
>         void *ret;
>
>         rte_spinlock_lock(&(heap->lock));
> @@ -665,7 +703,7 @@ void *
>  malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int
> flags,
>                 size_t align, bool contig)
>  {
> -       int socket, i, cur_socket;
> +       int socket, i, cur_socket, heap_id;
>         void *ret;
>
>         /* return NULL if align is not power-of-2 */
> @@ -680,11 +718,13 @@ malloc_heap_alloc_biggest(const char *type, int
> socket_arg, unsigned int flags,
>         else
>                 socket = socket_arg;
>
> -       /* Check socket parameter */
> -       if (socket >= RTE_MAX_NUMA_NODES)
> +       /* turn socket ID into heap ID */
> +       heap_id = malloc_socket_to_heap_id(socket);
> +       /* if heap id is negative, socket ID was invalid */
> +       if (heap_id < 0)
>                 return NULL;
>
> -       ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
> +       ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
>                         contig);
>         if (ret != NULL || socket_arg != SOCKET_ID_ANY)
>                 return ret;
> @@ -694,8 +734,8 @@ malloc_heap_alloc_biggest(const char *type, int
> socket_arg, unsigned int flags,
>                 cur_socket = rte_socket_id_by_idx(i);
>                 if (cur_socket == socket)
>                         continue;
> -               ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
> -                               align, contig);
> +               ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
> +                               contig);
>                 if (ret != NULL)
>                         return ret;
>         }
> @@ -760,7 +800,7 @@ malloc_heap_free(struct malloc_elem *elem)
>         /* ...of which we can't avail if we are in legacy mode, or if this
> is an
>          * externally allocated segment.
>          */
> -       if (internal_config.legacy_mem || msl->external)
> +       if (internal_config.legacy_mem || (msl->external > 0))
>                 goto free_unlock;
>
>         /* check if we can free any memory back to the system */
> @@ -917,7 +957,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t
> size)
>  }
>
>  /*
> - * Function to retrieve data for heap on given socket
> + * Function to retrieve data for a given heap
>   */
>  int
>  malloc_heap_get_stats(struct malloc_heap *heap,
> @@ -955,7 +995,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
>  }
>
>  /*
> - * Function to retrieve data for heap on given socket
> + * Function to retrieve data for a given heap
>   */
>  void
>  malloc_heap_dump(struct malloc_heap *heap, FILE *f)
> diff --git a/lib/librte_eal/common/malloc_heap.h
> b/lib/librte_eal/common/malloc_heap.h
> index f52cb5559..61b844b6f 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
>  void
>  malloc_heap_dump(struct malloc_heap *heap, FILE *f);
>
> +int
> +malloc_socket_to_heap_id(unsigned int socket_id);
> +
>  int
>  rte_eal_malloc_heap_init(void);
>
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 47ca5a742..73d6df31d 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
>                 struct rte_malloc_socket_stats *socket_stats)
>  {
>         struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> +       int heap_idx, ret = -1;
>
> -       if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
> -               return -1;
> +       rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
>
> -       return malloc_heap_get_stats(&mcfg->malloc_heaps[socket],
> socket_stats);
> +       heap_idx = malloc_socket_to_heap_id(socket);
> +       if (heap_idx < 0)
> +               goto unlock;
> +
> +       ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
> +                       socket_stats);
> +unlock:
> +       rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
> +
> +       return ret;
>  }
>
>  /*
> @@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
>         struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
>         unsigned int idx;
>
> -       for (idx = 0; idx < rte_socket_count(); idx++) {
> -               unsigned int socket = rte_socket_id_by_idx(idx);
> -               fprintf(f, "Heap on socket %i:\n", socket);
> -               malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
> +       rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
> +
> +       for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
> +               fprintf(f, "Heap id: %u\n", idx);
> +               malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
>         }
>
> +       rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>  }
>
>  /*
> @@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
>  void
>  rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
>  {
> -       unsigned int socket;
> +       struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> +       unsigned int heap_id;
>         struct rte_malloc_socket_stats sock_stats;
> +
> +       rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
> +
>         /* Iterate through all initialised heaps */
> -       for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
> -               if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
> -                       continue;
> +       for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
> +               struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
>
> -               fprintf(f, "Socket:%u\n", socket);
> +               malloc_heap_get_stats(heap, &sock_stats);
> +
> +               fprintf(f, "Heap id:%u\n", heap_id);
>                 fprintf(f, "\tHeap_size:%zu,\n",
> sock_stats.heap_totalsz_bytes);
>                 fprintf(f, "\tFree_size:%zu,\n",
> sock_stats.heap_freesz_bytes);
>                 fprintf(f, "\tAlloc_size:%zu,\n",
> sock_stats.heap_allocsz_bytes);
> @@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char
> *type)
>                 fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
>                 fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
>         }
> +       rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
>         return;
>  }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 13:14           ` Alejandro Lucero
  2018-09-27 13:21             ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Alejandro Lucero @ 2018-09-27 13:14 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> We will be assigning "invalid" socket ID's to external heap, and
> malloc will now be able to verify if a supplied socket ID is in
> fact a valid one, rendering parameter checks for sockets
> obsolete.
>
> This changes the semantics of what we understand by "socket ID",
> so document the change in the release notes.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>  lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>  lib/librte_eal/common/malloc_heap.c        | 2 +-
>  lib/librte_eal/common/rte_malloc.c         | 4 ----
>  4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 5fc71e208..6ee236302 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -98,6 +98,13 @@ API Changes
>      users of memseg-walk-related functions, as they will now have to skip
>      externally allocated segments in most cases if the intent is to only
> iterate
>      over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK has gained additional
> +    meaning, as some socket ID's will now be representing externally
> allocated
> +    memory. No changes will be required for existing code as backwards
> +    compatibility will be kept, and those who do not use this feature
> will not
> +    see these extra socket ID's. Any new API's must not check socket ID
> +    parameters themselves, and must instead leave it to the memory
> subsystem to
> +    decide whether socket ID is a valid one.
>
>  ABI Changes
>  -----------
> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> b/lib/librte_eal/common/eal_common_memzone.c
> index 7300fe05d..b7081afbf 100644
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> *name, size_t len,
>                 return NULL;
>         }
>
> -       if ((socket_id != SOCKET_ID_ANY) &&
> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>

Should not it be better to use RTE_MAX_HEAP instead of removing the check?



>                 rte_errno = EINVAL;
>                 return NULL;
>         }
>
> -       if (!rte_eal_has_hugepages())
> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
> +        * external heap.
> +        */
> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>                 socket_id = SOCKET_ID_ANY;
>
>         contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 1d1e35708..73e478076 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
>         if (size == 0 || (align && !rte_is_power_of_2(align)))
>                 return NULL;
>
> -       if (!rte_eal_has_hugepages())
> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>                 socket_arg = SOCKET_ID_ANY;
>
>         if (socket_arg == SOCKET_ID_ANY)
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 73d6df31d..9ba1472c3 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> unsigned int align,
>         if (!rte_eal_has_hugepages())
>                 socket_arg = SOCKET_ID_ANY;
>
> -       /* Check socket parameter */
> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
> -               return NULL;
> -
>

Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.


>         return malloc_heap_alloc(type, size, socket_arg, 0,
>                         align == 0 ? 1 : align, 0, false);
>  }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-27 13:01           ` Alejandro Lucero
@ 2018-09-27 13:18             ` Burakov, Anatoly
  2018-09-27 13:21               ` Alejandro Lucero
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-27 13:18 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Shreyansh Jain, Shahaf Shuler,
	Andrew Rybchenko

On 27-Sep-18 2:01 PM, Alejandro Lucero wrote:
> On Thu, Sep 27, 2018 at 11:47 AM Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
> 
>> Switch over all parts of EAL to use heap ID instead of NUMA node
>> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
>> node's index within the detected NUMA node list. Heap ID for
>> external heaps will be order of their creation.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   config/common_base                            |  1 +
>>   config/rte_config.h                           |  1 +
>>   .../common/include/rte_eal_memconfig.h        |  4 +-
>>   .../common/include/rte_malloc_heap.h          |  1 +
>>   lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
>>   lib/librte_eal/common/malloc_heap.h           |  3 +
>>   lib/librte_eal/common/rte_malloc.c            | 41 +++++---
>>   7 files changed, 106 insertions(+), 43 deletions(-)
>>
>> diff --git a/config/common_base b/config/common_base
>> index 155c7d40e..b52770b27 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>>   CONFIG_RTE_LIBRTE_EAL=y
>>   CONFIG_RTE_MAX_LCORE=128
>>   CONFIG_RTE_MAX_NUMA_NODES=8
>> +CONFIG_RTE_MAX_HEAPS=32
>>   CONFIG_RTE_MAX_MEMSEG_LISTS=64
>>   # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
>>   # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is
>> smaller
>> diff --git a/config/rte_config.h b/config/rte_config.h
>> index 567051b9c..5dd2ac1ad 100644
>> --- a/config/rte_config.h
>> +++ b/config/rte_config.h
>> @@ -24,6 +24,7 @@
>>   #define RTE_BUILD_SHARED_LIB
>>
>>   /* EAL defines */
>> +#define RTE_MAX_HEAPS 32
>>   #define RTE_MAX_MEMSEG_LISTS 128
>>   #define RTE_MAX_MEMSEG_PER_LIST 8192
>>   #define RTE_MAX_MEM_MB_PER_LIST 32768
>> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
>> b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> index 6baa6854f..d7920a4e0 100644
>> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
>> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> @@ -72,8 +72,8 @@ struct rte_mem_config {
>>
>>          struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for
>> objects */
>>
>> -       /* Heaps of Malloc per socket */
>> -       struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
>> +       /* Heaps of Malloc */
>> +       struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
>>
>>          /* address of mem_config in primary process. used to map shared
>> config into
>>           * exact same address the primary process maps it.
>> diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h
>> b/lib/librte_eal/common/include/rte_malloc_heap.h
>> index d43fa9097..e7ac32d42 100644
>> --- a/lib/librte_eal/common/include/rte_malloc_heap.h
>> +++ b/lib/librte_eal/common/include/rte_malloc_heap.h
>> @@ -27,6 +27,7 @@ struct malloc_heap {
>>
>>          unsigned alloc_count;
>>          size_t total_size;
>> +       unsigned int socket_id;
>>   } __rte_cache_aligned;
>>
>>   #endif /* _RTE_MALLOC_HEAP_H_ */
>> diff --git a/lib/librte_eal/common/malloc_heap.c
>> b/lib/librte_eal/common/malloc_heap.c
>> index 3c8e2063b..1d1e35708 100644
>> --- a/lib/librte_eal/common/malloc_heap.c
>> +++ b/lib/librte_eal/common/malloc_heap.c
>> @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
>>          return check_flag & flags;
>>   }
>>
>> +int
>> +malloc_socket_to_heap_id(unsigned int socket_id)
>> +{
>> +       struct rte_mem_config *mcfg =
>> rte_eal_get_configuration()->mem_config;
>> +       int i;
>> +
>> +       for (i = 0; i < RTE_MAX_HEAPS; i++) {
>> +               struct malloc_heap *heap = &mcfg->malloc_heaps[i];
>> +
>> +               if (heap->socket_id == socket_id)
>> +                       return i;
>> +       }
>> +       return -1;
>> +}
>> +
>>   /*
>>    * Expand the heap with a memory area.
>>    */
>> @@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
>>          struct rte_mem_config *mcfg =
>> rte_eal_get_configuration()->mem_config;
>>          struct rte_memseg_list *found_msl;
>>          struct malloc_heap *heap;
>> -       int msl_idx;
>> +       int msl_idx, heap_idx;
>>
>>          if (msl->external)
>>                  return 0;
>>
>> -       heap = &mcfg->malloc_heaps[msl->socket_id];
>> +       heap_idx = malloc_socket_to_heap_id(msl->socket_id);
>>
> 
> malloc_socket_to_heap_id can return -1 so it requires to handle that
> possibility.
> 

Not really, this is called from memseg walk function - we know the msl 
and its socket ID are valid. Or at least something has gone *very* wrong 
if we got a -1 result :) However, i guess this check won't hurt.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-27 13:18             ` Burakov, Anatoly
@ 2018-09-27 13:21               ` Alejandro Lucero
  0 siblings, 0 replies; 225+ messages in thread
From: Alejandro Lucero @ 2018-09-27 13:21 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Shreyansh Jain, Shahaf Shuler,
	Andrew Rybchenko

On Thu, Sep 27, 2018 at 2:18 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 27-Sep-18 2:01 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:47 AM Anatoly Burakov <
> anatoly.burakov@intel.com>
> > wrote:
> >
> >> Switch over all parts of EAL to use heap ID instead of NUMA node
> >> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> >> node's index within the detected NUMA node list. Heap ID for
> >> external heaps will be order of their creation.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>   config/common_base                            |  1 +
> >>   config/rte_config.h                           |  1 +
> >>   .../common/include/rte_eal_memconfig.h        |  4 +-
> >>   .../common/include/rte_malloc_heap.h          |  1 +
> >>   lib/librte_eal/common/malloc_heap.c           | 98 +++++++++++++------
> >>   lib/librte_eal/common/malloc_heap.h           |  3 +
> >>   lib/librte_eal/common/rte_malloc.c            | 41 +++++---
> >>   7 files changed, 106 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/config/common_base b/config/common_base
> >> index 155c7d40e..b52770b27 100644
> >> --- a/config/common_base
> >> +++ b/config/common_base
> >> @@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
> >>   CONFIG_RTE_LIBRTE_EAL=y
> >>   CONFIG_RTE_MAX_LCORE=128
> >>   CONFIG_RTE_MAX_NUMA_NODES=8
> >> +CONFIG_RTE_MAX_HEAPS=32
> >>   CONFIG_RTE_MAX_MEMSEG_LISTS=64
> >>   # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST
> pages
> >>   # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is
> >> smaller
> >> diff --git a/config/rte_config.h b/config/rte_config.h
> >> index 567051b9c..5dd2ac1ad 100644
> >> --- a/config/rte_config.h
> >> +++ b/config/rte_config.h
> >> @@ -24,6 +24,7 @@
> >>   #define RTE_BUILD_SHARED_LIB
> >>
> >>   /* EAL defines */
> >> +#define RTE_MAX_HEAPS 32
> >>   #define RTE_MAX_MEMSEG_LISTS 128
> >>   #define RTE_MAX_MEMSEG_PER_LIST 8192
> >>   #define RTE_MAX_MEM_MB_PER_LIST 32768
> >> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> >> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> >> index 6baa6854f..d7920a4e0 100644
> >> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> >> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> >> @@ -72,8 +72,8 @@ struct rte_mem_config {
> >>
> >>          struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs
> for
> >> objects */
> >>
> >> -       /* Heaps of Malloc per socket */
> >> -       struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
> >> +       /* Heaps of Malloc */
> >> +       struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
> >>
> >>          /* address of mem_config in primary process. used to map shared
> >> config into
> >>           * exact same address the primary process maps it.
> >> diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h
> >> b/lib/librte_eal/common/include/rte_malloc_heap.h
> >> index d43fa9097..e7ac32d42 100644
> >> --- a/lib/librte_eal/common/include/rte_malloc_heap.h
> >> +++ b/lib/librte_eal/common/include/rte_malloc_heap.h
> >> @@ -27,6 +27,7 @@ struct malloc_heap {
> >>
> >>          unsigned alloc_count;
> >>          size_t total_size;
> >> +       unsigned int socket_id;
> >>   } __rte_cache_aligned;
> >>
> >>   #endif /* _RTE_MALLOC_HEAP_H_ */
> >> diff --git a/lib/librte_eal/common/malloc_heap.c
> >> b/lib/librte_eal/common/malloc_heap.c
> >> index 3c8e2063b..1d1e35708 100644
> >> --- a/lib/librte_eal/common/malloc_heap.c
> >> +++ b/lib/librte_eal/common/malloc_heap.c
> >> @@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t
> hugepage_sz)
> >>          return check_flag & flags;
> >>   }
> >>
> >> +int
> >> +malloc_socket_to_heap_id(unsigned int socket_id)
> >> +{
> >> +       struct rte_mem_config *mcfg =
> >> rte_eal_get_configuration()->mem_config;
> >> +       int i;
> >> +
> >> +       for (i = 0; i < RTE_MAX_HEAPS; i++) {
> >> +               struct malloc_heap *heap = &mcfg->malloc_heaps[i];
> >> +
> >> +               if (heap->socket_id == socket_id)
> >> +                       return i;
> >> +       }
> >> +       return -1;
> >> +}
> >> +
> >>   /*
> >>    * Expand the heap with a memory area.
> >>    */
> >> @@ -93,12 +108,13 @@ malloc_add_seg(const struct rte_memseg_list *msl,
> >>          struct rte_mem_config *mcfg =
> >> rte_eal_get_configuration()->mem_config;
> >>          struct rte_memseg_list *found_msl;
> >>          struct malloc_heap *heap;
> >> -       int msl_idx;
> >> +       int msl_idx, heap_idx;
> >>
> >>          if (msl->external)
> >>                  return 0;
> >>
> >> -       heap = &mcfg->malloc_heaps[msl->socket_id];
> >> +       heap_idx = malloc_socket_to_heap_id(msl->socket_id);
> >>
> >
> > malloc_socket_to_heap_id can return -1 so it requires to handle that
> > possibility.
> >
>
> Not really, this is called from memseg walk function - we know the msl
> and its socket ID are valid. Or at least something has gone *very* wrong
> if we got a -1 result :) However, i guess this check won't hurt.
>
>
Although that error is impossible now, not doing the check could be a
problem if there is another code path in the future where socket_id has not
checked yet.


> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:14           ` Alejandro Lucero
@ 2018-09-27 13:21             ` Burakov, Anatoly
  2018-09-27 13:42               ` Alejandro Lucero
  0 siblings, 1 reply; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-27 13:21 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
> 
>> We will be assigning "invalid" socket ID's to external heap, and
>> malloc will now be able to verify if a supplied socket ID is in
>> fact a valid one, rendering parameter checks for sockets
>> obsolete.
>>
>> This changes the semantics of what we understand by "socket ID",
>> so document the change in the release notes.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>>   lib/librte_eal/common/malloc_heap.c        | 2 +-
>>   lib/librte_eal/common/rte_malloc.c         | 4 ----
>>   4 files changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_11.rst
>> b/doc/guides/rel_notes/release_18_11.rst
>> index 5fc71e208..6ee236302 100644
>> --- a/doc/guides/rel_notes/release_18_11.rst
>> +++ b/doc/guides/rel_notes/release_18_11.rst
>> @@ -98,6 +98,13 @@ API Changes
>>       users of memseg-walk-related functions, as they will now have to skip
>>       externally allocated segments in most cases if the intent is to only
>> iterate
>>       over internal DPDK memory.
>> +  - ``socket_id`` parameter across the entire DPDK has gained additional
>> +    meaning, as some socket ID's will now be representing externally
>> allocated
>> +    memory. No changes will be required for existing code as backwards
>> +    compatibility will be kept, and those who do not use this feature
>> will not
>> +    see these extra socket ID's. Any new API's must not check socket ID
>> +    parameters themselves, and must instead leave it to the memory
>> subsystem to
>> +    decide whether socket ID is a valid one.
>>
>>   ABI Changes
>>   -----------
>> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>> b/lib/librte_eal/common/eal_common_memzone.c
>> index 7300fe05d..b7081afbf 100644
>> --- a/lib/librte_eal/common/eal_common_memzone.c
>> +++ b/lib/librte_eal/common/eal_common_memzone.c
>> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
>> *name, size_t len,
>>                  return NULL;
>>          }
>>
>> -       if ((socket_id != SOCKET_ID_ANY) &&
>> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>>
> 
> Should not it be better to use RTE_MAX_HEAP instead of removing the check?

First of all, maximum number of heaps should not concern the rest of the 
code - this is purely internal detail of rte_malloc.

More importantly, socket ID is completely independent from number of 
heaps. Socket ID is incremented each time a new heap is created, and 
they are not reused. If you create and destroy a heap 100 times - you'll 
get 100 different socket ID's, even though max number of heaps is less 
than that.

> 
> 
> 
>>                  rte_errno = EINVAL;
>>                  return NULL;
>>          }
>>
>> -       if (!rte_eal_has_hugepages())
>> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
>> +        * external heap.
>> +        */
>> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>>                  socket_id = SOCKET_ID_ANY;
>>
>>          contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
>> diff --git a/lib/librte_eal/common/malloc_heap.c
>> b/lib/librte_eal/common/malloc_heap.c
>> index 1d1e35708..73e478076 100644
>> --- a/lib/librte_eal/common/malloc_heap.c
>> +++ b/lib/librte_eal/common/malloc_heap.c
>> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
>> socket_arg,
>>          if (size == 0 || (align && !rte_is_power_of_2(align)))
>>                  return NULL;
>>
>> -       if (!rte_eal_has_hugepages())
>> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>>                  socket_arg = SOCKET_ID_ANY;
>>
>>          if (socket_arg == SOCKET_ID_ANY)
>> diff --git a/lib/librte_eal/common/rte_malloc.c
>> b/lib/librte_eal/common/rte_malloc.c
>> index 73d6df31d..9ba1472c3 100644
>> --- a/lib/librte_eal/common/rte_malloc.c
>> +++ b/lib/librte_eal/common/rte_malloc.c
>> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
>> unsigned int align,
>>          if (!rte_eal_has_hugepages())
>>                  socket_arg = SOCKET_ID_ANY;
>>
>> -       /* Check socket parameter */
>> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
>> -               return NULL;
>> -
>>
> 
> Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.

same as above :)


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:21             ` Burakov, Anatoly
@ 2018-09-27 13:42               ` Alejandro Lucero
  2018-09-27 14:04                 ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Alejandro Lucero @ 2018-09-27 13:42 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <
> anatoly.burakov@intel.com>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
> >>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >>   lib/librte_eal/common/malloc_heap.c        | 2 +-
> >>   lib/librte_eal/common/rte_malloc.c         | 4 ----
> >>   4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >>       users of memseg-walk-related functions, as they will now have to
> skip
> >>       externally allocated segments in most cases if the intent is to
> only
> >> iterate
> >>       over internal DPDK memory.
> >> +  - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> +    meaning, as some socket ID's will now be representing externally
> >> allocated
> >> +    memory. No changes will be required for existing code as backwards
> >> +    compatibility will be kept, and those who do not use this feature
> >> will not
> >> +    see these extra socket ID's. Any new API's must not check socket ID
> >> +    parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> +    decide whether socket ID is a valid one.
> >>
> >>   ABI Changes
> >>   -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >>                  return NULL;
> >>          }
> >>
> >> -       if ((socket_id != SOCKET_ID_ANY) &&
> >> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing the
> check?
>
> First of all, maximum number of heaps should not concern the rest of the
> code - this is purely internal detail of rte_malloc.
>
>
In a previous patch you say that:

"Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation."

If I understand this right, heaps linked to physical sockets get a heap ID,
and then external heaps will get IDs starting from the higher socket/heap
ID + 1.
So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
(which does not seem so reading your next paragraph), it would be a good
sanity check to use RTE_MAX_HEAPS for the socket id.

More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times - you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
I do not understand this. It is true there is no check regarding
RTE_MAX_HEAPS when creating new heaps, then nor sure what the limit refers
to. And then there is code like dumping heaps info or getting info from the
heap based on socket id that will not work.


> >
> >
> >
> >>                  rte_errno = EINVAL;
> >>                  return NULL;
> >>          }
> >>
> >> -       if (!rte_eal_has_hugepages())
> >> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for
> an
> >> +        * external heap.
> >> +        */
> >> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> >>                  socket_id = SOCKET_ID_ANY;
> >>
> >>          contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> >> diff --git a/lib/librte_eal/common/malloc_heap.c
> >> b/lib/librte_eal/common/malloc_heap.c
> >> index 1d1e35708..73e478076 100644
> >> --- a/lib/librte_eal/common/malloc_heap.c
> >> +++ b/lib/librte_eal/common/malloc_heap.c
> >> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> >> socket_arg,
> >>          if (size == 0 || (align && !rte_is_power_of_2(align)))
> >>                  return NULL;
> >>
> >> -       if (!rte_eal_has_hugepages())
> >> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> >>                  socket_arg = SOCKET_ID_ANY;
> >>
> >>          if (socket_arg == SOCKET_ID_ANY)
> >> diff --git a/lib/librte_eal/common/rte_malloc.c
> >> b/lib/librte_eal/common/rte_malloc.c
> >> index 73d6df31d..9ba1472c3 100644
> >> --- a/lib/librte_eal/common/rte_malloc.c
> >> +++ b/lib/librte_eal/common/rte_malloc.c
> >> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> >> unsigned int align,
> >>          if (!rte_eal_has_hugepages())
> >>                  socket_arg = SOCKET_ID_ANY;
> >>
> >> -       /* Check socket parameter */
> >> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
> >> -               return NULL;
> >> -
> >>
> >
> > Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
>
> same as above :)
>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:42               ` Alejandro Lucero
@ 2018-09-27 14:04                 ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-09-27 14:04 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On 27-Sep-18 2:42 PM, Alejandro Lucero wrote:
> 
> 
> On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
>      > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov
>     <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>>
>      > wrote:
>      >
>      >> We will be assigning "invalid" socket ID's to external heap, and
>      >> malloc will now be able to verify if a supplied socket ID is in
>      >> fact a valid one, rendering parameter checks for sockets
>      >> obsolete.
>      >>
>      >> This changes the semantics of what we understand by "socket ID",
>      >> so document the change in the release notes.
>      >>
>      >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com
>     <mailto:anatoly.burakov@intel.com>>
>      >> ---
>      >>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>      >>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>      >>   lib/librte_eal/common/malloc_heap.c        | 2 +-
>      >>   lib/librte_eal/common/rte_malloc.c         | 4 ----
>      >>   4 files changed, 13 insertions(+), 8 deletions(-)
>      >>
>      >> diff --git a/doc/guides/rel_notes/release_18_11.rst
>      >> b/doc/guides/rel_notes/release_18_11.rst
>      >> index 5fc71e208..6ee236302 100644
>      >> --- a/doc/guides/rel_notes/release_18_11.rst
>      >> +++ b/doc/guides/rel_notes/release_18_11.rst
>      >> @@ -98,6 +98,13 @@ API Changes
>      >>       users of memseg-walk-related functions, as they will now
>     have to skip
>      >>       externally allocated segments in most cases if the intent
>     is to only
>      >> iterate
>      >>       over internal DPDK memory.
>      >> +  - ``socket_id`` parameter across the entire DPDK has gained
>     additional
>      >> +    meaning, as some socket ID's will now be representing
>     externally
>      >> allocated
>      >> +    memory. No changes will be required for existing code as
>     backwards
>      >> +    compatibility will be kept, and those who do not use this
>     feature
>      >> will not
>      >> +    see these extra socket ID's. Any new API's must not check
>     socket ID
>      >> +    parameters themselves, and must instead leave it to the memory
>      >> subsystem to
>      >> +    decide whether socket ID is a valid one.
>      >>
>      >>   ABI Changes
>      >>   -----------
>      >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>      >> b/lib/librte_eal/common/eal_common_memzone.c
>      >> index 7300fe05d..b7081afbf 100644
>      >> --- a/lib/librte_eal/common/eal_common_memzone.c
>      >> +++ b/lib/librte_eal/common/eal_common_memzone.c
>      >> @@ -120,13 +120,15 @@
>     memzone_reserve_aligned_thread_unsafe(const char
>      >> *name, size_t len,
>      >>                  return NULL;
>      >>          }
>      >>
>      >> -       if ((socket_id != SOCKET_ID_ANY) &&
>      >> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>      >> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>      >>
>      >
>      > Should not it be better to use RTE_MAX_HEAP instead of removing
>     the check?
> 
>     First of all, maximum number of heaps should not concern the rest of
>     the
>     code - this is purely internal detail of rte_malloc.
> 
> 
> In a previous patch you say that:
> 
> "Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list. Heap ID for
> external heaps will be order of their creation."
> 
> If I understand this right, heaps linked to physical sockets get a heap 
> ID, and then external heaps will get IDs starting from the higher 
> socket/heap ID + 1.

Yes and no.

Socket ID is an externally visible identification of "where to allocate 
from" (a heap). Heap ID is used internally. Normally, there is a 1:1 
correspondence of NUMA node to heap ID, but there may be cases where 
e.g. only NUMA nodes 0 and 7 are detected, so you'll have socket 0 and 7 
as valid socket ID's. However, these socket ID's will be internally 
resolved into heap ID's 0 and 1, not 0 and 7.

So, in *most* cases, socket ID for an internal heap is equivalent to its 
heap ID, but it is by accident. Heap ID is an internal identifier used 
by the malloc heap, and it is not visible externally - it is only known 
to malloc itself. Even memzone knows nothing about heap ID's - only 
socket ID's.

> So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps 
> (which does not seem so reading your next paragraph), it would be a good 
> sanity check to use RTE_MAX_HEAPS for the socket id.
> 
>     More importantly, socket ID is completely independent from number of
>     heaps. Socket ID is incremented each time a new heap is created, and
>     they are not reused. If you create and destroy a heap 100 times -
>     you'll
>     get 100 different socket ID's, even though max number of heaps is less
>     than that.
> 
> 
> I do not understand this. It is true there is no check regarding 
> RTE_MAX_HEAPS when creating new heaps,

There is one :) RTE_MAX_HEAPS is length of malloc heaps array (shared in 
memory). If we cannot find a vacant spot in heaps array, the heap will 
not be created.

However, *socket ID* is indeed limited only to INT_MAX. Socket ID is not 
heap ID - socket ID is an externally visible identifier. Multiple socket 
ID's can resolve to the same heap ID.

For example, if you create and destroy a heap 5 times one after the 
other, you'll get 5 different socket ID's, but all of them would have 
pointed to the same heap ID (but not at the same time).

So, semantically speaking, heap ID isn't really "an ID" as such, it's an 
index into heap array. Unlike socket ID, it has no meaning.

> then nor sure what the limit 
> refers to. And then there is code like dumping heaps info or getting 
> info from the heap based on socket id that will not work.

It is probably unclear because the ordering of this patchset is not 
ideal (and i'm not sure how to make it any better).

The code for dumping or getting heap info's accepts socket ID, but it 
translates it into heap ID, because that's what malloc uses internally 
to differentiate between the heaps. Heap ID is there to break dependency 
between NUMA node ID and position in the malloc heap array.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 05/21] flow_classify: do not check for invalid socket ID
  2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 05/21] flow_classify: " Anatoly Burakov
@ 2018-09-27 16:14           ` Iremonger, Bernard
  0 siblings, 0 replies; 225+ messages in thread
From: Iremonger, Bernard @ 2018-09-27 16:14 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, September 27, 2018 11:41 AM
> To: dev@dpdk.org
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>;
> laszlo.madarassy@ericsson.com; laszlo.vadkerti@ericsson.com;
> andras.kovacs@ericsson.com; winnie.tian@ericsson.com;
> daniel.andrasi@ericsson.com; janos.kobor@ericsson.com;
> geza.koblo@ericsson.com; srinath.mannam@broadcom.com;
> scott.branden@broadcom.com; ajit.khaparde@broadcom.com; Wiles, Keith
> <keith.wiles@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> thomas@monjalon.net; shreyansh.jain@nxp.com; shahafs@mellanox.com;
> arybchenko@solarflare.com; alejandro.lucero@netronome.com
> Subject: [PATCH v6 05/21] flow_classify: do not check for invalid socket ID
> 
> We will be assigning "invalid" socket ID's to external heap, and malloc will now
> be able to verify if a supplied socket ID is in fact a valid one, rendering
> parameter checks for sockets obsolete.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
  2018-09-27 11:03           ` Shreyansh Jain
@ 2018-09-29  0:09           ` Yongseok Koh
  1 sibling, 0 replies; 225+ messages in thread
From: Yongseok Koh @ 2018-09-29  0:09 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, Thomas Monjalon, alejandro.lucero

On Thu, Sep 27, 2018 at 11:40:59AM +0100, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
> 
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
> 
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
> 
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> 
> Notes:
>     v3:
>     - Add comment to explain the process of picking up minimum
>       page sizes for mempool
>     
>     v2:
>     - Add documentation changes and ABI break
>     
>     v1:
>     - Adjust all calls to memseg walk functions to ignore external
>       segments where it made sense to do so
> 
>  doc/guides/rel_notes/deprecation.rst          | 15 --------
>  doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
>  drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
>  drivers/net/mlx4/mlx4_mr.c                    |  3 ++
>  drivers/net/mlx5/mlx5.c                       |  5 ++-
>  drivers/net/mlx5/mlx5_mr.c                    |  3 ++
>  drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
>  lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
>  lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
>  lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
>  lib/librte_eal/common/eal_common_memory.c     |  3 ++
>  .../common/include/rte_eal_memconfig.h        |  1 +
>  lib/librte_eal/common/include/rte_memory.h    |  9 +++++
>  lib/librte_eal/common/malloc_elem.c           | 10 ++++--
>  lib/librte_eal/common/malloc_heap.c           |  9 +++--
>  lib/librte_eal/common/rte_malloc.c            |  2 +-
>  lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
>  lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
>  lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
>  lib/librte_eal/meson.build                    |  2 +-
>  lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
>  test/test/test_malloc.c                       |  3 ++
>  test/test/test_memzone.c                      |  3 ++
>  24 files changed, 134 insertions(+), 44 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 138335dfb..d2aec64d1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> -* eal: certain structures will change in EAL on account of upcoming external
> -  memory support. Aside from internal changes leading to an ABI break, the
> -  following externally visible changes will also be implemented:
> -
> -  - ``rte_memseg_list`` will change to include a boolean flag indicating
> -    whether a particular memseg list is externally allocated. This will have
> -    implications for any users of memseg-walk-related functions, as they will
> -    now have to skip externally allocated segments in most cases if the intent
> -    is to only iterate over internal DPDK memory.
> -  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> -    as some socket ID's will now be representing externally allocated memory. No
> -    changes will be required for existing code as backwards compatibility will
> -    be kept, and those who do not use this feature will not see these extra
> -    socket ID's.
> -
>  * eal: both declaring and identifying devices will be streamlined in v18.11.
>    New functions will appear to query a specific port from buses, classes of
>    device and device drivers. Device declaration will be made coherent with the
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index bc9b74ec4..5fc71e208 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -91,6 +91,13 @@ API Changes
>    flag the MAC can be properly configured in any case. This is particularly
>    important for bonding.
>  
> +* eal: The following API changes were made in 18.11:
> +
> +  - ``rte_memseg_list`` structure now has an additional flag indicating whether
> +    the memseg list is externally allocated. This will have implications for any
> +    users of memseg-walk-related functions, as they will now have to skip
> +    externally allocated segments in most cases if the intent is to only iterate
> +    over internal DPDK memory.
>  
>  ABI Changes
>  -----------
> @@ -107,6 +114,10 @@ ABI Changes
>     =========================================================
>  
>  
> +* eal: EAL library ABI version was changed due to previously announced work on
> +       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
> +       a new flag indicating whether the memseg list refers to external memory.
> +
>  Removed Items
>  -------------
>  
> @@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
>       librte_compressdev.so.1
>       librte_cryptodev.so.5
>       librte_distributor.so.1
> -     librte_eal.so.8
> +   + librte_eal.so.9
>       librte_ethdev.so.10
>       librte_eventdev.so.4
>       librte_flow_classify.so.1
> diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
> index 4c2cd2a87..2e9244fb7 100644
> --- a/drivers/bus/fslmc/fslmc_vfio.c
> +++ b/drivers/bus/fslmc/fslmc_vfio.c
> @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
>  }
>  
>  static int
> -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
> -		 const struct rte_memseg *ms, void *arg)
> +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
> +		void *arg)
>  {
>  	int *n_segs = arg;
>  	int ret;
>  
> +	if (msl->external)
> +		return 0;
> +
>  	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
>  	if (ret)
>  		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
> diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
> index d23d3c613..9f5d790b6 100644
> --- a/drivers/net/mlx4/mlx4_mr.c
> +++ b/drivers/net/mlx4/mlx4_mr.c
> @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
>  {
>  	struct mr_find_contig_memsegs_data *data = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

Because memory free event for external memory is available, current design of
mlx4/mlx5 memory mgmt can accommodate the new external memory support. So,
please remove it so that PMD can traverse external memory as well.

>  	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
>  		return 0;
>  	/* Found, save it and stop walking. */
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 30d4e70a7..c90e1d8ce 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
>  static void *uar_base;
>  
>  static int
> -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
> +find_lower_va_bound(const struct rte_memseg_list *msl,
>  		const struct rte_memseg *ms, void *arg)
>  {
>  	void **addr = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

This one is fine.
But can you please remove the blank line?
That's a rule by former maintainers. :-)

>  	if (*addr == NULL)
>  		*addr = ms->addr;
>  	else
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 1d1bcb5fe..fd4345f9c 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
>  {
>  	struct mr_find_contig_memsegs_data *data = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

Like I mentioned, please remove it.

If those two changes in mlx4/5_mr.c are removed, for the whole patch,

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks
Yongseok

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 00/21] Support externally allocated memory in DPDK
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                               ` (21 more replies)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 01/21] mem: add length to memseg list Anatoly Burakov
                             ` (20 subsequent siblings)
  21 siblings, 22 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap

v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 318 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  28 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  13 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx5/mlx5.c                       |   4 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |   8 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 320 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 45 files changed, 1923 insertions(+), 138 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 01/21] mem: add length to memseg list
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                             ` (19 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                             ` (18 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  6 +++-
 drivers/net/mlx5/mlx5.c                       |  4 ++-
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 22 files changed, 127 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,10 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+       a new flag indicating whether the memseg list refers to external memory.
+
 Removed Items
 -------------
 
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 
 static int
 fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+		const struct rte_memseg *ms, void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	/* if IOVA address is invalid, skip */
+	if (ms->iova == RTE_BAD_IOVA)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..fc3cb1b49 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (2 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                             ` (17 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../common/include/rte_eal_memconfig.h        |   4 +-
 .../common/include/rte_malloc_heap.h          |   1 +
 lib/librte_eal/common/malloc_heap.c           | 102 +++++++++++++-----
 lib/librte_eal/common/malloc_heap.h           |   3 +
 lib/librte_eal/common/rte_malloc.c            |  41 ++++---
 7 files changed, 110 insertions(+), 43 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	if (heap_idx < 0) {
+		RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+		return -1;
+	}
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (3 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 05/21] flow_classify: " Anatoly Burakov
                             ` (16 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a9cfa423f..09b06061d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 05/21] flow_classify: do not check for invalid socket ID
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (4 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 06/21] pipeline: " Anatoly Burakov
                             ` (15 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 06/21] pipeline: do not check for invalid socket ID
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (5 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 05/21] flow_classify: " Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 07/21] sched: " Anatoly Burakov
                             ` (14 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 07/21] sched: do not check for invalid socket ID
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (6 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 06/21] pipeline: " Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
                             ` (13 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (7 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 07/21] sched: " Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
                             ` (12 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  1 +
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
 * eal: EAL library ABI version was changed due to previously announced work on
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
+       Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1024,6 +1023,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 09/21] malloc: add function to query socket ID of named heap
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (8 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 10/21] malloc: add function to check if socket is external Anatoly Burakov
                             ` (11 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 10/21] malloc: add function to check if socket is external
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (9 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-10-01 11:04           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps Anatoly Burakov
                             ` (10 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, alejandro.lucero

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (10 preceding siblings ...)
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 10/21] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 12/21] malloc: allow destroying heaps Anatoly Burakov
                             ` (9 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
        Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+       Structure ``rte_eal_memconfig`` has been extended to contain next socket
+       ID for externally allocated memory segments.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 12/21] malloc: allow destroying heaps
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (11 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
                             ` (8 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 00fdf54f7..ca774c96f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1053,6 +1053,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 13/21] malloc: allow adding memory to named heaps
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (12 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 12/21] malloc: allow destroying heaps Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 14/21] malloc: allow removing memory from " Anatoly Burakov
                             ` (7 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ca774c96f..256c25edf 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 14/21] malloc: allow removing memory from named heaps
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (13 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
                             ` (6 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 256c25edf..adc1669aa 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1097,6 +1123,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 15/21] malloc: allow attaching to external memory chunks
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (14 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 14/21] malloc: allow removing memory from " Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 16/21] malloc: allow detaching from external memory Anatoly Burakov
                             ` (5 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..793f9473a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..5078235b1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 16/21] malloc: allow detaching from external memory
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (15 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 17/21] malloc: enable event callbacks for " Anatoly Burakov
                             ` (4 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 31 +++++++++++++++++-----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 793f9473a..7249e6aae 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5078235b1..72e42b337 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 		ret = -1;
 		goto unlock;
 	}
-	/* we shouldn't be able to attach to internal heaps */
+	/* we shouldn't be able to sync to internal heaps */
 	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
 		rte_errno = EPERM;
 		ret = -1;
 		goto unlock;
 	}
 
-	/* find corresponding memseg list to attach to */
+	/* find corresponding memseg list to sync to */
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 17/21] malloc: enable event callbacks for external memory
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (16 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 16/21] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 18/21] test: add unit tests for external memory support Anatoly Burakov
                             ` (3 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Maxime Coquelin, Tiwei Bie,
	Zhihong Wang, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas, shahafs, arybchenko, alejandro.lucero

When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++++
 .../net/virtio/virtio_user/virtio_user_dev.c  |  8 ++++++
 lib/librte_eal/common/malloc_heap.c           |  7 +++++
 lib/librte_eal/common/rte_malloc.c            | 27 ++++++++++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 10 +++++--
 5 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index cb33dd891..493b6e5be 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len,
 					"alloc" : "dealloc",
 				va, virt_addr, iova_addr, map_len);
 
+		/* iova_addr may be set to RTE_BAD_IOVA */
+		if (iova_addr == RTE_BAD_IOVA) {
+			DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n");
+			cur_len += map_len;
+			continue;
+		}
+
 		if (type == RTE_MEM_EVENT_ALLOC)
 			ret = fslmc_map_dma(virt_addr, iova_addr, map_len);
 		else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 7df600b02..de813d0df 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -13,6 +13,8 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include <rte_eal_memconfig.h>
+
 #include "vhost.h"
 #include "virtio_user_dev.h"
 #include "../virtio_ethdev.h"
@@ -282,8 +284,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused,
 						 void *arg)
 {
 	struct virtio_user_dev *dev = arg;
+	struct rte_memseg_list *msl;
 	uint16_t i;
 
+	/* ignore externally allocated memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl->external)
+		return;
+
 	pthread_mutex_lock(&dev->mutex);
 
 	if (dev->started == false)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index adc1669aa..08ec75377 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1031,6 +1031,9 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	msl = elem->msl;
 
+	/* notify all subscribers that a memory area is going to be removed */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
+
 	/* this element can be removed */
 	malloc_elem_free_list_remove(elem);
 	malloc_elem_hide_region(elem, elem, len);
@@ -1120,6 +1123,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
 			heap->name, va_addr);
 
+	/* notify all subscribers that a new memory area has been added */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+			va_addr, seg_len);
+
 	return 0;
 }
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72e42b337..2c19c2f87 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -25,6 +25,7 @@
 #include <rte_malloc.h>
 #include "malloc_elem.h"
 #include "malloc_heap.h"
+#include "eal_memalloc.h"
 
 
 /* Free the memory space back to heap */
@@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		if (wa->attach)
+		if (wa->attach) {
 			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		else
+		} else {
+			/* notify all subscribers that a memory area is about to
+			 * be removed
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+					msl->base_va, msl->len);
 			ret = rte_fbarray_detach(&found_msl->memseg_arr);
+		}
 
-		if (ret < 0)
+		if (ret < 0) {
 			wa->result = -rte_errno;
-		else
+		} else {
+			/* notify all subscribers that a new memory area was
+			 * added
+			 */
+			if (wa->attach)
+				eal_memalloc_mem_event_notify(
+						RTE_MEM_EVENT_ALLOC,
+						msl->base_va, msl->len);
 			wa->result = 0;
+		}
 		return 1;
 	}
 	return 0;
@@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 		rte_errno = -wa.result;
 		ret = -1;
 	} else {
+		/* notify all subscribers that a new memory area was added */
+		if (attach)
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
 		ret = 0;
 	}
 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index fddbc3b54..d7268e4ce 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	msl = rte_mem_virt2memseg_list(addr);
 
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
@@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
 	while (cur_len < len) {
+		/* some memory segments may have invalid IOVA */
+		if (ms->iova == RTE_BAD_IOVA) {
+			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
+					ms->addr);
+			goto next;
+		}
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 1);
 		else
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 0);
-
+next:
 		cur_len += ms->len;
 		++ms;
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 18/21] test: add unit tests for external memory support
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (17 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 17/21] malloc: enable event callbacks for " Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 19/21] app/testpmd: add support for external memory Anatoly Burakov
                             ` (2 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 19/21] app/testpmd: add support for external memory
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (18 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 18/21] test: add unit tests for external memory support Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 20/21] doc: add external memory feature to the release notes Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 318 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 362 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a0f934932..4789910b3 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2413,6 +2413,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2421,12 +2438,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..b4016668c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon or xmem\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..7f4bd62ac 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -63,6 +64,22 @@
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGETLB
+/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
+#define HUGE_FLAG (0x40000)
+#else
+#define HUGE_FLAG MAP_HUGETLB
+#endif
+
+#ifndef MAP_HUGE_SHIFT
+/* older kernels (or FreeBSD) will not have this define */
+#define HUGE_SHIFT (26)
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +548,229 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 128MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 128 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return (log2 << HUGE_SHIFT);
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= HUGE_FLAG | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int cur_page, n_pages, pgsz_idx;
+	size_t mem_sz, cur_pgsz;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		/* populate IOVA addresses */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			size_t offset;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+			iova = rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+		}
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param = {};
+	int socket_id, ret;
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* when using VFIO, memory is automatically mapped for DMA by EAL */
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +789,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 20/21] doc: add external memory feature to the release notes
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (19 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5065ec1af..4248ff4f9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,11 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
 
 API Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v7 21/21] doc: add external memory feature to programmer's guide
  2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                             ` (20 preceding siblings ...)
  2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 20/21] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-10-01 11:05           ` Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..00ce64ceb 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,43 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable, and DMA mappings will not be performed
+    - Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+    - Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+    - Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 00/21] Support externally allocated memory in DPDK
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
                                 ` (21 more replies)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
                               ` (20 subsequent siblings)
  21 siblings, 22 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes

v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap

v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 320 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  37 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  13 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx5/mlx5.c                       |   4 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c |   3 +
 .../net/virtio/virtio_user/virtio_user_dev.c  |   6 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 320 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 45 files changed, 1930 insertions(+), 138 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 17:01               ` Stephen Hemminger
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                               ` (19 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/release_18_11.rst            | 8 +++++++-
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/Makefile                | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
 lib/librte_eal/linuxapp/eal/Makefile              | 2 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 lib/librte_eal/meson.build                        | 2 +-
 10 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c00e33cc..9c17762a5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -134,6 +134,12 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK:
+         - structure ``rte_memseg_list`` now has a new field indicating length
+           of memory addressed by the segment list
+
+
 Removed Items
 -------------
 
@@ -179,7 +185,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                               ` (18 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        |  9 ++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  6 +++-
 drivers/net/mlx5/mlx5.c                       |  4 ++-
 drivers/net/virtio/virtio_user/vhost_kernel.c |  3 ++
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 19 files changed, 119 insertions(+), 39 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c17762a5..d55e12a27 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -102,6 +102,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+  whether the memseg list is externally allocated. This will have implications
+  for any users of memseg-walk-related functions, as they will now have to skip
+  externally allocated segments in most cases if the intent is to only iterate
+  over internal DPDK memory.
+
 * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
   functions were deprecated since 17.05 and are replaced by
   ``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -118,7 +124,6 @@ API Changes
   To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
   offload.
 
-
 ABI Changes
 -----------
 
@@ -138,6 +143,8 @@ ABI Changes
        supporting external memory in DPDK:
          - structure ``rte_memseg_list`` now has a new field indicating length
            of memory addressed by the segment list
+         - structure ``rte_memseg_list`` now has a new flag indicating whether
+           the memseg list refers to external memory
 
 
 Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 
 static int
 fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+		const struct rte_memseg *ms, void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	/* if IOVA address is invalid, skip */
+	if (ms->iova == RTE_BAD_IOVA)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
 	void *start_addr;
 	uint64_t len;
 
+	if (msl->external)
+		return 0;
+
 	if (vm->nregions >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (2 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                               ` (17 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

This breaks the ABI, so document the changes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 doc/guides/rel_notes/release_18_11.rst        |   5 +-
 .../common/include/rte_eal_memconfig.h        |   4 +-
 .../common/include/rte_malloc_heap.h          |   1 +
 lib/librte_eal/common/malloc_heap.c           | 102 +++++++++++++-----
 lib/librte_eal/common/malloc_heap.h           |   3 +
 lib/librte_eal/common/rte_malloc.c            |  41 ++++---
 8 files changed, 114 insertions(+), 44 deletions(-)

diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d55e12a27..c627c1e88 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -145,7 +145,10 @@ ABI Changes
            of memory addressed by the segment list
          - structure ``rte_memseg_list`` now has a new flag indicating whether
            the memseg list refers to external memory
-
+         - structure ``rte_malloc_heap`` now has a new field indicating socket
+           ID the malloc heap belongs to
+         - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+           resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
 
 	unsigned alloc_count;
 	size_t total_size;
+	unsigned int socket_id;
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	if (heap_idx < 0) {
+		RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+		return -1;
+	}
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 04/21] mem: do not check for invalid socket ID
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (3 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 05/21] flow_classify: " Anatoly Burakov
                               ` (16 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index c627c1e88..9583f3eda 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -107,6 +107,13 @@ API Changes
   for any users of memseg-walk-related functions, as they will now have to skip
   externally allocated segments in most cases if the intent is to only iterate
   over internal DPDK memory.
+  ``socket_id`` parameter across the entire DPDK has gained additional meaning,
+  as some socket ID's will now be representing externally allocated memory. No
+  changes will be required for existing code as backwards compatibility will be
+  kept, and those who do not use this feature will not see these extra socket
+  ID's. Any new API's must not check socket ID parameters themselves, and must
+  instead leave it to the memory subsystem to decide whether socket ID is a
+  valid one.
 
 * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
   functions were deprecated since 17.05 and are replaced by
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a9cfa423f..09b06061d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 05/21] flow_classify: do not check for invalid socket ID
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (4 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 06/21] pipeline: " Anatoly Burakov
                               ` (15 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 06/21] pipeline: do not check for invalid socket ID
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (5 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 05/21] flow_classify: " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 07/21] sched: " Anatoly Burakov
                               ` (14 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 07/21] sched: do not check for invalid socket ID
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (6 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 06/21] pipeline: " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
                               ` (13 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (7 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 07/21] sched: " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
                               ` (12 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  2 ++
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9583f3eda..a6bddaaf4 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -156,6 +156,8 @@ ABI Changes
            ID the malloc heap belongs to
          - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
            resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+         - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1024,6 +1023,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 09/21] malloc: add function to query socket ID of named heap
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (8 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 10/21] malloc: add function to check if socket is external Anatoly Burakov
                               ` (11 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 10/21] malloc: add function to check if socket is external
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (9 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps Anatoly Burakov
                               ` (10 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, alejandro.lucero

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (10 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 10/21] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 12/21] malloc: allow destroying heaps Anatoly Burakov
                               ` (9 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a6bddaaf4..cb6308b1f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -157,6 +157,8 @@ ABI Changes
          - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
            resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
          - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+         - structure ``rte_eal_memconfig`` has been extended to contain next
+           socket ID for externally allocated segments
 
 
 Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 12/21] malloc: allow destroying heaps
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (11 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
                               ` (8 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 00fdf54f7..ca774c96f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1053,6 +1053,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 13/21] malloc: allow adding memory to named heaps
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (12 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 12/21] malloc: allow destroying heaps Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 14/21] malloc: allow removing memory from " Anatoly Burakov
                               ` (7 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ca774c96f..256c25edf 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 14/21] malloc: allow removing memory from named heaps
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (13 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
                               ` (6 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 256c25edf..adc1669aa 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1097,6 +1123,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 15/21] malloc: allow attaching to external memory chunks
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (14 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 14/21] malloc: allow removing memory from " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 16/21] malloc: allow detaching from external memory Anatoly Burakov
                               ` (5 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..793f9473a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..5078235b1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 16/21] malloc: allow detaching from external memory
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (15 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 17/21] malloc: enable event callbacks for " Anatoly Burakov
                               ` (4 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 31 +++++++++++++++++-----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 793f9473a..7249e6aae 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5078235b1..72e42b337 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 		ret = -1;
 		goto unlock;
 	}
-	/* we shouldn't be able to attach to internal heaps */
+	/* we shouldn't be able to sync to internal heaps */
 	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
 		rte_errno = EPERM;
 		ret = -1;
 		goto unlock;
 	}
 
-	/* find corresponding memseg list to attach to */
+	/* find corresponding memseg list to sync to */
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 17/21] malloc: enable event callbacks for external memory
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (16 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 16/21] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 18/21] test: add unit tests for external memory support Anatoly Burakov
                               ` (3 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Maxime Coquelin, Tiwei Bie,
	Zhihong Wang, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas, shahafs, arybchenko, alejandro.lucero

When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++++
 .../net/virtio/virtio_user/virtio_user_dev.c  |  6 +++++
 lib/librte_eal/common/malloc_heap.c           |  7 +++++
 lib/librte_eal/common/rte_malloc.c            | 27 ++++++++++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 10 +++++--
 5 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index cb33dd891..493b6e5be 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len,
 					"alloc" : "dealloc",
 				va, virt_addr, iova_addr, map_len);
 
+		/* iova_addr may be set to RTE_BAD_IOVA */
+		if (iova_addr == RTE_BAD_IOVA) {
+			DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n");
+			cur_len += map_len;
+			continue;
+		}
+
 		if (type == RTE_MEM_EVENT_ALLOC)
 			ret = fslmc_map_dma(virt_addr, iova_addr, map_len);
 		else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 55a82e4b0..a185aed34 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -301,8 +301,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused,
 						 void *arg)
 {
 	struct virtio_user_dev *dev = arg;
+	struct rte_memseg_list *msl;
 	uint16_t i;
 
+	/* ignore externally allocated memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl->external)
+		return;
+
 	pthread_mutex_lock(&dev->mutex);
 
 	if (dev->started == false)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index adc1669aa..08ec75377 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1031,6 +1031,9 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	msl = elem->msl;
 
+	/* notify all subscribers that a memory area is going to be removed */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
+
 	/* this element can be removed */
 	malloc_elem_free_list_remove(elem);
 	malloc_elem_hide_region(elem, elem, len);
@@ -1120,6 +1123,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
 			heap->name, va_addr);
 
+	/* notify all subscribers that a new memory area has been added */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+			va_addr, seg_len);
+
 	return 0;
 }
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72e42b337..2c19c2f87 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -25,6 +25,7 @@
 #include <rte_malloc.h>
 #include "malloc_elem.h"
 #include "malloc_heap.h"
+#include "eal_memalloc.h"
 
 
 /* Free the memory space back to heap */
@@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		if (wa->attach)
+		if (wa->attach) {
 			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		else
+		} else {
+			/* notify all subscribers that a memory area is about to
+			 * be removed
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+					msl->base_va, msl->len);
 			ret = rte_fbarray_detach(&found_msl->memseg_arr);
+		}
 
-		if (ret < 0)
+		if (ret < 0) {
 			wa->result = -rte_errno;
-		else
+		} else {
+			/* notify all subscribers that a new memory area was
+			 * added
+			 */
+			if (wa->attach)
+				eal_memalloc_mem_event_notify(
+						RTE_MEM_EVENT_ALLOC,
+						msl->base_va, msl->len);
 			wa->result = 0;
+		}
 		return 1;
 	}
 	return 0;
@@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 		rte_errno = -wa.result;
 		ret = -1;
 	} else {
+		/* notify all subscribers that a new memory area was added */
+		if (attach)
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
 		ret = 0;
 	}
 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index fddbc3b54..d7268e4ce 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	msl = rte_mem_virt2memseg_list(addr);
 
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
@@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
 	while (cur_len < len) {
+		/* some memory segments may have invalid IOVA */
+		if (ms->iova == RTE_BAD_IOVA) {
+			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
+					ms->addr);
+			goto next;
+		}
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 1);
 		else
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 0);
-
+next:
 		cur_len += ms->len;
 		++ms;
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 18/21] test: add unit tests for external memory support
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (17 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 17/21] malloc: enable event callbacks for " Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory Anatoly Burakov
                               ` (2 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index e6967bab6..074ac6e03 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index b1dd6eca2..3abf02b71 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -155,6 +155,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (18 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 18/21] test: add unit tests for external memory support Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 15:11               ` Iremonger, Bernard
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 20/21] doc: add external memory feature to the release notes Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 320 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 364 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 794aa5268..3b921cfc6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2423,6 +2423,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2431,12 +2448,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..b4016668c 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon or xmem\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..255a9c664 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -63,6 +64,22 @@
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGETLB
+/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
+#define HUGE_FLAG (0x40000)
+#else
+#define HUGE_FLAG MAP_HUGETLB
+#endif
+
+#ifndef MAP_HUGE_SHIFT
+/* older kernels (or FreeBSD) will not have this define */
+#define HUGE_SHIFT (26)
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +548,231 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 128MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 128 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+	return (log2 << HUGE_SHIFT);
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= HUGE_FLAG | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int cur_page, n_pages, pgsz_idx;
+	size_t mem_sz, cur_pgsz;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		/* populate IOVA addresses */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			size_t offset;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+			iova = rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+		}
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param;
+	int socket_id, ret;
+
+	memset(&param, 0, sizeof(param));
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* when using VFIO, memory is automatically mapped for DMA by EAL */
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +791,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 20/21] doc: add external memory feature to the release notes
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (19 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index cb6308b1f..9c5f24af3 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
+
 * **Add support to offload more flow match and actions for CXGBE PMD**
 
   Flow API support has been enhanced for CXGBE Poll Mode Driver to offload:
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v8 21/21] doc: add external memory feature to programmer's guide
  2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
                               ` (20 preceding siblings ...)
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 20/21] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-10-01 12:56             ` Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..00ce64ceb 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,43 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable, and DMA mappings will not be performed
+    - Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+    - Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+    - Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-10-01 15:11               ` Iremonger, Bernard
  2018-10-01 15:23                 ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Iremonger, Bernard @ 2018-10-01 15:11 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Lu, Wenzhuo, Wu, Jingjing, Mcnamara, John, Kovacevic, Marko,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Monday, October 1, 2018 1:56 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; laszlo.madarassy@ericsson.com;
> laszlo.vadkerti@ericsson.com; andras.kovacs@ericsson.com;
> winnie.tian@ericsson.com; daniel.andrasi@ericsson.com;
> janos.kobor@ericsson.com; geza.koblo@ericsson.com;
> srinath.mannam@broadcom.com; scott.branden@broadcom.com;
> ajit.khaparde@broadcom.com; Wiles, Keith <keith.wiles@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net;
> shreyansh.jain@nxp.com; shahafs@mellanox.com;
> arybchenko@solarflare.com; alejandro.lucero@netronome.com
> Subject: [PATCH v8 19/21] app/testpmd: add support for external memory
> 
> Currently, mempools can only be allocated either using native DPDK memory, or
> anonymous memory. This patch will add two new methods to allocate mempool
> using external memory (regular or hugepage memory), and add documentation
> about it to testpmd user guide.
> 
> It adds a new flag "--mp-alloc", with four possible values:
> native (use regular DPDK allocator), anon (use anonymous mempool), xmem
> (use externally allocated memory area), and xmemhuge (use externally allocated
> hugepage memory area). Old flag "--mp-anon" is kept for compatibility.
> 
> All external memory is allocated using the same external heap, but each will
> allocate and add a new memory area.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test-pmd/config.c                 |  21 +-
>  app/test-pmd/parameters.c             |  23 +-
>  app/test-pmd/testpmd.c                | 320 ++++++++++++++++++++++++--
>  app/test-pmd/testpmd.h                |  13 +-
>  doc/guides/testpmd_app_ug/run_app.rst |  12 +
>  5 files changed, 364 insertions(+), 25 deletions(-)
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> 794aa5268..3b921cfc6 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -2423,6 +2423,23 @@ fwd_config_setup(void)
>  		simple_fwd_config_setup();
>  }
> 
> +static const char *
> +mp_alloc_to_str(uint8_t mode)
> +{
> +	switch (mode) {
> +	case MP_ALLOC_NATIVE:
> +		return "native";
> +	case MP_ALLOC_ANON:
> +		return "anon";
> +	case MP_ALLOC_XMEM:
> +		return "xmem";
> +	case MP_ALLOC_XMEM_HUGE:
> +		return "xmemhuge";
> +	default:
> +		return "invalid";
> +	}
> +}
> +
>  void
>  pkt_fwd_config_display(struct fwd_config *cfg)  { @@ -2431,12 +2448,12 @@
> pkt_fwd_config_display(struct fwd_config *cfg)
>  	streamid_t sm_id;
> 
>  	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
> -		"NUMA support %s, MP over anonymous pages %s\n",
> +		"NUMA support %s, MP allocation mode: %s\n",
>  		cfg->fwd_eng->fwd_mode_name,
>  		retry_enabled == 0 ? "" : " with retry",
>  		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg-
> >nb_fwd_streams,
>  		numa_support == 1 ? "enabled" : "disabled",
> -		mp_anon != 0 ? "enabled" : "disabled");
> +		mp_alloc_to_str(mp_alloc_type));
> 
>  	if (retry_enabled)
>  		printf("TX retry num: %u, delay between TX retries: %uus\n",
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> 9220e1c1b..b4016668c 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -190,6 +190,11 @@ usage(char* progname)
>  	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
>  	printf("  --mlockall: lock all memory\n");
>  	printf("  --no-mlockall: do not lock all memory\n");
> +	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool
> allocation method.\n"
> +	       "    native: use regular DPDK memory to create and populate
> mempool\n"
> +	       "    anon: use regular DPDK memory to create and anonymous
> memory to populate mempool\n"
> +	       "    xmem: use anonymous memory to create and populate
> mempool\n"
> +	       "    xmemhuge: use anonymous hugepage memory to create and
> populate mempool\n");
>  }
> 
>  #ifdef RTE_LIBRTE_CMDLINE
> @@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
>  		{ "vxlan-gpe-port",		1, 0, 0 },
>  		{ "mlockall",			0, 0, 0 },
>  		{ "no-mlockall",		0, 0, 0 },
> +		{ "mp-alloc",			1, 0, 0 },
>  		{ 0, 0, 0, 0 },
>  	};
> 
> @@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
>  			if (!strcmp(lgopts[opt_idx].name, "numa"))
>  				numa_support = 1;
>  			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
> -				mp_anon = 1;
> +				mp_alloc_type = MP_ALLOC_ANON;
> +			}
> +			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
> +				if (!strcmp(optarg, "native"))
> +					mp_alloc_type = MP_ALLOC_NATIVE;
> +				else if (!strcmp(optarg, "anon"))
> +					mp_alloc_type = MP_ALLOC_ANON;
> +				else if (!strcmp(optarg, "xmem"))
> +					mp_alloc_type = MP_ALLOC_XMEM;
> +				else if (!strcmp(optarg, "xmemhuge"))
> +					mp_alloc_type =
> MP_ALLOC_XMEM_HUGE;
> +				else
> +					rte_exit(EXIT_FAILURE,
> +						"mp-alloc %s invalid - must be:
> "
> +						"native, anon or xmem\n",

Should xmemhuge be added to above line?

> +						 optarg);
>  			}
>  			if (!strcmp(lgopts[opt_idx].name, "port-numa-config"))
> {
>  				if (parse_portnuma_config(optarg))
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 001f0e552..255a9c664 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -27,6 +27,7 @@
>  #include <rte_log.h>
>  #include <rte_debug.h>
>  #include <rte_cycles.h>
> +#include <rte_malloc_heap.h>
>  #include <rte_memory.h>
>  #include <rte_memcpy.h>
>  #include <rte_launch.h>
> @@ -63,6 +64,22 @@
> 
>  #include "testpmd.h"
> 
> +#ifndef MAP_HUGETLB
> +/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
> +#define HUGE_FLAG (0x40000) #else #define HUGE_FLAG MAP_HUGETLB
> #endif
> +
> +#ifndef MAP_HUGE_SHIFT
> +/* older kernels (or FreeBSD) will not have this define */ #define
> +HUGE_SHIFT (26) #else #define HUGE_SHIFT MAP_HUGE_SHIFT #endif
> +
> +#define EXTMEM_HEAP_NAME "extmem"
> +
>  uint16_t verbose_level = 0; /**< Silent by default. */  int testpmd_logtype; /**<
> Log type for testpmd logs */
> 
> @@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by
> default */  uint8_t socket_num = UMA_NO_CONFIG;
> 
>  /*
> - * Use ANONYMOUS mapped memory (might be not physically continuous) for
> mbufs.
> + * Select mempool allocation type:
> + * - native: use regular DPDK memory
> + * - anon: use regular DPDK memory to create mempool, but populate using
> + *         anonymous memory (may not be IOVA-contiguous)
> + * - xmem: use externally allocated hugepage memory
>   */
> -uint8_t mp_anon = 0;
> +uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
> 
>  /*
>   * Store specified sockets on which memory pool to be used by ports @@ -
> 527,6 +548,231 @@ set_def_fwd_config(void)
>  	set_default_fwd_ports_config();
>  }
> 
> +/* extremely pessimistic estimation of memory required to create a
> +mempool */ static int calc_mem_size(uint32_t nb_mbufs, uint32_t
> +mbuf_sz, size_t pgsz, size_t *out) {
> +	unsigned int n_pages, mbuf_per_pg, leftover;
> +	uint64_t total_mem, mbuf_mem, obj_sz;
> +
> +	/* there is no good way to predict how much space the mempool will
> +	 * occupy because it will allocate chunks on the fly, and some of those
> +	 * will come from default DPDK memory while some will come from our
> +	 * external memory, so just assume 128MB will be enough for everyone.
> +	 */
> +	uint64_t hdr_mem = 128 << 20;
> +
> +	/* account for possible non-contiguousness */
> +	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
> +	if (obj_sz > pgsz) {
> +		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
> +		return -1;
> +	}
> +
> +	mbuf_per_pg = pgsz / obj_sz;
> +	leftover = (nb_mbufs % mbuf_per_pg) > 0;
> +	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
> +
> +	mbuf_mem = n_pages * pgsz;
> +
> +	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
> +
> +	if (total_mem > SIZE_MAX) {
> +		TESTPMD_LOG(ERR, "Memory size too big\n");
> +		return -1;
> +	}
> +	*out = (size_t)total_mem;
> +
> +	return 0;
> +}
> +
> +static inline uint32_t
> +bsf64(uint64_t v)
> +{
> +	return (uint32_t)__builtin_ctzll(v);
> +}
> +
> +static inline uint32_t
> +log2_u64(uint64_t v)
> +{
> +	if (v == 0)
> +		return 0;
> +	v = rte_align64pow2(v);
> +	return bsf64(v);
> +}
> +
> +static int
> +pagesz_flags(uint64_t page_sz)
> +{
> +	/* as per mmap() manpage, all page sizes are log2 of page size
> +	 * shifted by MAP_HUGE_SHIFT
> +	 */
> +	int log2 = log2_u64(page_sz);

Missing blank line after declarations.

> +	return (log2 << HUGE_SHIFT);
> +}
> +
> +static void *
> +alloc_mem(size_t memsz, size_t pgsz, bool huge) {
> +	void *addr;
> +	int flags;
> +
> +	/* allocate anonymous hugepages */
> +	flags = MAP_ANONYMOUS | MAP_PRIVATE;
> +	if (huge)
> +		flags |= HUGE_FLAG | pagesz_flags(pgsz);
> +
> +	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
> +	if (addr == MAP_FAILED)
> +		return NULL;
> +
> +	return addr;
> +}
> +
> +struct extmem_param {
> +	void *addr;
> +	size_t len;
> +	size_t pgsz;
> +	rte_iova_t *iova_table;
> +	unsigned int iova_table_len;
> +};
> +
> +static int
> +create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param
> *param,
> +		bool huge)
> +{
> +	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM
> */
> +			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
> +	unsigned int cur_page, n_pages, pgsz_idx;
> +	size_t mem_sz, cur_pgsz;
> +	rte_iova_t *iovas = NULL;
> +	void *addr;
> +	int ret;
> +
> +	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
> +		/* skip anything that is too big */
> +		if (pgsizes[pgsz_idx] > SIZE_MAX)
> +			continue;
> +
> +		cur_pgsz = pgsizes[pgsz_idx];
> +
> +		/* if we were told not to allocate hugepages, override */
> +		if (!huge)
> +			cur_pgsz = sysconf(_SC_PAGESIZE);
> +
> +		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
> +		if (ret < 0) {
> +			TESTPMD_LOG(ERR, "Cannot calculate memory
> size\n");
> +			return -1;
> +		}
> +
> +		/* allocate our memory */
> +		addr = alloc_mem(mem_sz, cur_pgsz, huge);
> +
> +		/* if we couldn't allocate memory with a specified page size,
> +		 * that doesn't mean we can't do it with other page sizes, so
> +		 * try another one.
> +		 */
> +		if (addr == NULL)
> +			continue;
> +
> +		/* store IOVA addresses for every page in this memory area */
> +		n_pages = mem_sz / cur_pgsz;
> +
> +		iovas = malloc(sizeof(*iovas) * n_pages);
> +
> +		if (iovas == NULL) {
> +			TESTPMD_LOG(ERR, "Cannot allocate memory for iova
> addresses\n");
> +			goto fail;
> +		}
> +		/* lock memory if it's not huge pages */
> +		if (!huge)
> +			mlock(addr, mem_sz);
> +
> +		/* populate IOVA addresses */
> +		for (cur_page = 0; cur_page < n_pages; cur_page++) {
> +			rte_iova_t iova;
> +			size_t offset;
> +			void *cur;
> +
> +			offset = cur_pgsz * cur_page;
> +			cur = RTE_PTR_ADD(addr, offset);
> +			iova = rte_mem_virt2iova(cur);
> +
> +			iovas[cur_page] = iova;
> +		}
> +
> +		break;
> +	}
> +	/* if we couldn't allocate anything */
> +	if (iovas == NULL)
> +		return -1;
> +
> +	param->addr = addr;
> +	param->len = mem_sz;
> +	param->pgsz = cur_pgsz;
> +	param->iova_table = iovas;
> +	param->iova_table_len = n_pages;
> +
> +	return 0;
> +fail:
> +	if (iovas)
> +		free(iovas);
> +	if (addr)
> +		munmap(addr, mem_sz);
> +
> +	return -1;
> +}
> +
> +static int
> +setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge) {
> +	struct extmem_param param;
> +	int socket_id, ret;
> +
> +	memset(&param, 0, sizeof(param));
> +
> +	/* check if our heap exists */
> +	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
> +	if (socket_id < 0) {
> +		/* create our heap */
> +		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
> +		if (ret < 0) {
> +			TESTPMD_LOG(ERR, "Cannot create heap\n");
> +			return -1;
> +		}
> +	}
> +
> +	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
> +	if (ret < 0) {
> +		TESTPMD_LOG(ERR, "Cannot create memory area\n");
> +		return -1;
> +	}
> +
> +	/* we now have a valid memory area, so add it to heap */
> +	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
> +			param.addr, param.len, param.iova_table,
> +			param.iova_table_len, param.pgsz);
> +
> +	/* when using VFIO, memory is automatically mapped for DMA by EAL
> */
> +
> +	/* not needed any more */
> +	free(param.iova_table);
> +
> +	if (ret < 0) {
> +		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
> +		munmap(param.addr, param.len);
> +		return -1;
> +	}
> +
> +	/* success */
> +
> +	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
> +			param.len >> 20);
> +
> +	return 0;
> +}
> +
>  /*
>   * Configuration initialisation done once at init time.
>   */
> @@ -545,27 +791,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned
> nb_mbuf,
>  		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
>  		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
> 
> -	if (mp_anon != 0) {
> -		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
> -			mb_size, (unsigned) mb_mempool_cache,
> -			sizeof(struct rte_pktmbuf_pool_private),
> -			socket_id, 0);
> -		if (rte_mp == NULL)
> -			goto err;
> +	switch (mp_alloc_type) {
> +	case MP_ALLOC_NATIVE:
> +		{
> +			/* wrapper to rte_mempool_create() */
> +			TESTPMD_LOG(INFO, "preferred mempool ops
> selected: %s\n",
> +					rte_mbuf_best_mempool_ops());
> +			rte_mp = rte_pktmbuf_pool_create(pool_name,
> nb_mbuf,
> +				mb_mempool_cache, 0, mbuf_seg_size,
> socket_id);
> +			break;
> +		}
> +	case MP_ALLOC_ANON:
> +		{
> +			rte_mp = rte_mempool_create_empty(pool_name,
> nb_mbuf,
> +				mb_size, (unsigned int) mb_mempool_cache,
> +				sizeof(struct rte_pktmbuf_pool_private),
> +				socket_id, 0);
> +			if (rte_mp == NULL)
> +				goto err;
> +
> +			if (rte_mempool_populate_anon(rte_mp) == 0) {
> +				rte_mempool_free(rte_mp);
> +				rte_mp = NULL;
> +				goto err;
> +			}
> +			rte_pktmbuf_pool_init(rte_mp, NULL);
> +			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init,
> NULL);
> +			break;
> +		}
> +	case MP_ALLOC_XMEM:
> +	case MP_ALLOC_XMEM_HUGE:
> +		{
> +			int heap_socket;
> +			bool huge = mp_alloc_type ==
> MP_ALLOC_XMEM_HUGE;
> 
> -		if (rte_mempool_populate_anon(rte_mp) == 0) {
> -			rte_mempool_free(rte_mp);
> -			rte_mp = NULL;
> -			goto err;
> +			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
> +				rte_exit(EXIT_FAILURE, "Could not create
> external memory\n");
> +
> +			heap_socket =
> +
> 	rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
> +			if (heap_socket < 0)
> +				rte_exit(EXIT_FAILURE, "Could not get external
> memory socket
> +ID\n");
> +
> +			TESTPMD_LOG(INFO, "preferred mempool ops
> selected: %s\n",
> +					rte_mbuf_best_mempool_ops());
> +			rte_mp = rte_pktmbuf_pool_create(pool_name,
> nb_mbuf,
> +					mb_mempool_cache, 0,
> mbuf_seg_size,
> +					heap_socket);
> +			break;
> +		}
> +	default:
> +		{
> +			rte_exit(EXIT_FAILURE, "Invalid mempool creation
> mode\n");
>  		}
> -		rte_pktmbuf_pool_init(rte_mp, NULL);
> -		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
> -	} else {
> -		/* wrapper to rte_mempool_create() */
> -		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
> -				rte_mbuf_best_mempool_ops());
> -		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
> -			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
>  	}
> 
>  err:
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> a1f661472..65e0cec90 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -69,6 +69,16 @@ enum {
>  	PORT_TOPOLOGY_LOOP,
>  };
> 
> +enum {
> +	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
> +	MP_ALLOC_ANON,
> +	/**< allocate mempool natively, but populate using anonymous
> memory */
> +	MP_ALLOC_XMEM,
> +	/**< allocate and populate mempool using anonymous memory */
> +	MP_ALLOC_XMEM_HUGE
> +	/**< allocate and populate mempool using anonymous hugepage
> memory */
> +};
> +
>  #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
>  /**
>   * The data structure associated with RX and TX packet burst statistics @@ -
> 304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa"
> parameter */  extern uint16_t port_topology; /**< set by "--port-topology"
> parameter */  extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter
> */  extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */ -extern
> uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
> +extern uint8_t  mp_alloc_type;
> +/**< set by "--mp-anon" or "--mp-alloc" parameter */
>  extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
> extern volatile int test_done; /* stop packet forwarding when set to 1. */  extern
> uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */ diff --git
> a/doc/guides/testpmd_app_ug/run_app.rst
> b/doc/guides/testpmd_app_ug/run_app.rst
> index f301c2b6f..67a8532a4 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -498,3 +498,15 @@ The commandline options are:
>  *   ``--no-mlockall``
> 
>      Disable locking all memory.
> +
> +*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
> +
> +    Select mempool allocation mode:
> +
> +    * native: create and populate mempool using native DPDK memory
> +    * anon: create mempool using native DPDK memory, but populate using
> +      anonymous memory
> +    * xmem: create and populate mempool using externally and anonymously
> +      allocated area
> +    * xmemhuge: create and populate mempool using externally and
> anonymously
> +      allocated hugepage area
> --
> 2.17.1

The following checkpatch warnings is testpmd.c should probably be fixed.

WARNING: line over 80 characters
#332: FILE: app/test-pmd/testpmd.c:685:
+                       TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");

WARNING: line over 80 characters
#441: FILE: app/test-pmd/testpmd.c:798:
+                       TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",

WARNING: line over 80 characters
#476: FILE: app/test-pmd/testpmd.c:829:
+                               rte_exit(EXIT_FAILURE, "Could not create external memory\n");

WARNING: line over 80 characters
#481: FILE: app/test-pmd/testpmd.c:834:
+                               rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");

WARNING: line over 80 characters
#483: FILE: app/test-pmd/testpmd.c:836:
+                       TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",

WARNING: line over 80 characters
#492: FILE: app/test-pmd/testpmd.c:845:
+                       rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");

Regards,

Bernard.

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory
  2018-10-01 15:11               ` Iremonger, Bernard
@ 2018-10-01 15:23                 ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-10-01 15:23 UTC (permalink / raw)
  To: Iremonger, Bernard, dev
  Cc: Lu, Wenzhuo, Wu, Jingjing, Mcnamara, John, Kovacevic, Marko,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Hi Bernard,

Thanks for your review! Comments inline.

>> +					rte_exit(EXIT_FAILURE,
>> +						"mp-alloc %s invalid - must be:
>> "
>> +						"native, anon or xmem\n",
> 
> Should xmemhuge be added to above line?
> 

Yes :)

>> +						 optarg);
>>   			}
>>   			if (!strcmp(lgopts[opt_idx].name, "port-numa-config"))
>> {
>>   				if (parse_portnuma_config(optarg))
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 001f0e552..255a9c664 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -27,6 +27,7 @@
>>   #include <rte_log.h>
>>   #include <rte_debug.h>

<snip>

>> +static int
>> +pagesz_flags(uint64_t page_sz)
>> +{
>> +	/* as per mmap() manpage, all page sizes are log2 of page size
>> +	 * shifted by MAP_HUGE_SHIFT
>> +	 */
>> +	int log2 = log2_u64(page_sz);
> 
> Missing blank line after declarations.
> 

Thanks, will fix.

>> +	return (log2 << HUGE_SHIFT);
>> +}
>> +
>> +static void *
>> +alloc_mem(size_t memsz, size_t pgsz, bool huge) {
>> +	void *addr;
>> +	int flags;
>> +
>> +	/* allocate anon

<snip>

>> --
>> 2.17.1
> 
> The following checkpatch warnings is testpmd.c should probably be fixed.
> 
> WARNING: line over 80 characters
> #332: FILE: app/test-pmd/testpmd.c:685:
> +                       TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
> 
> WARNING: line over 80 characters
> #441: FILE: app/test-pmd/testpmd.c:798:
> +                       TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
> 
> WARNING: line over 80 characters
> #476: FILE: app/test-pmd/testpmd.c:829:
> +                               rte_exit(EXIT_FAILURE, "Could not create external memory\n");
> 
> WARNING: line over 80 characters
> #481: FILE: app/test-pmd/testpmd.c:834:
> +                               rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
> 
> WARNING: line over 80 characters
> #483: FILE: app/test-pmd/testpmd.c:836:
> +                       TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
> 
> WARNING: line over 80 characters
> #492: FILE: app/test-pmd/testpmd.c:845:
> +                       rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
> 
> Regards,
> 
> Bernard.
> 
> 

These should be ignored (and they indeed are ignored by DPDK's own 
checkpatch script). Strings are allowed to go over 80 characters so as 
to not make it hard to grep for them.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 17:01               ` Stephen Hemminger
  2018-10-02  9:03                 ` Burakov, Anatoly
  0 siblings, 1 reply; 225+ messages in thread
From: Stephen Hemminger @ 2018-10-01 17:01 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

On Mon,  1 Oct 2018 13:56:09 +0100
Anatoly Burakov <anatoly.burakov@intel.com> wrote:

> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688dd..1d8b0a6fe 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -30,6 +30,7 @@ struct rte_memseg_list {
>  		uint64_t addr_64;
>  		/**< Makes sure addr is always 64-bits */
>  	};
> +	size_t len; /**< Length of memory area covered by this memseg list. */
>  	int socket_id; /**< Socket ID for all memsegs in this list. */
>  	uint64_t page_sz; /**< Page size for all memsegs in this list. */
>  	volatile uint32_t version; /**< version number for multiprocess sync. */

If you are going to break ABI, why not try and rearrange to eliminate holes:

Output of pahole (on x86 64 bit):

struct rte_memseg_list {
	union {
		void *             base_va;              /*     0     8 */
		uint64_t           addr_64;              /*     0     8 */
	};                                               /*     0     8 */
	size_t                     len;                  /*     8     8 */
	int                        socket_id;            /*    16     4 */

	/* XXX 4 bytes hole, try to pack */

	uint64_t                   page_sz;              /*    24     8 */
	volatile uint32_t          version;              /*    32     4 */

	/* XXX 4 bytes hole, try to pack */

	struct rte_fbarray         memseg_arr;           /*    40    96 */

	/* XXX last struct has 4 bytes of padding */

	/* size: 136, cachelines: 3, members: 6 */
	/* sum members: 128, holes: 2, sum holes: 8 */
	/* paddings: 1, sum paddings: 4 */
	/* last cacheline: 8 bytes */
};

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
  2018-10-01 17:01               ` Stephen Hemminger
@ 2018-10-02  9:03                 ` Burakov, Anatoly
  0 siblings, 0 replies; 225+ messages in thread
From: Burakov, Anatoly @ 2018-10-02  9:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

On 01-Oct-18 6:01 PM, Stephen Hemminger wrote:
> On Mon,  1 Oct 2018 13:56:09 +0100
> Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> 
>> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> index aff0688dd..1d8b0a6fe 100644
>> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
>> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> @@ -30,6 +30,7 @@ struct rte_memseg_list {
>>   		uint64_t addr_64;
>>   		/**< Makes sure addr is always 64-bits */
>>   	};
>> +	size_t len; /**< Length of memory area covered by this memseg list. */
>>   	int socket_id; /**< Socket ID for all memsegs in this list. */
>>   	uint64_t page_sz; /**< Page size for all memsegs in this list. */
>>   	volatile uint32_t version; /**< version number for multiprocess sync. */
> 
> If you are going to break ABI, why not try and rearrange to eliminate holes:
> 
> Output of pahole (on x86 64 bit):
> 
> struct rte_memseg_list {
> 	union {
> 		void *             base_va;              /*     0     8 */
> 		uint64_t           addr_64;              /*     0     8 */
> 	};                                               /*     0     8 */
> 	size_t                     len;                  /*     8     8 */
> 	int                        socket_id;            /*    16     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	uint64_t                   page_sz;              /*    24     8 */
> 	volatile uint32_t          version;              /*    32     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	struct rte_fbarray         memseg_arr;           /*    40    96 */
> 
> 	/* XXX last struct has 4 bytes of padding */
> 
> 	/* size: 136, cachelines: 3, members: 6 */
> 	/* sum members: 128, holes: 2, sum holes: 8 */
> 	/* paddings: 1, sum paddings: 4 */
> 	/* last cacheline: 8 bytes */
> };
> 

Hi Stephen,

This data structure isn't performance-critical in any remote sense, but 
sure, I can do that.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 00/21] Support externally allocated memory in DPDK
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-11  9:15                 ` Thomas Monjalon
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
                                 ` (20 subsequent siblings)
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v9 -> v8 changes:
- Rebase on latest master
- Minor cosmetic testpmd changes as per Bernard's feedback
- Pack structures better (Stephen's suggestion)
- Touch pages before finding their IOVA address

v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes

v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap

v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 325 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  36 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  13 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx5/mlx5.c                       |   4 +-
 drivers/net/virtio/virtio_user/vhost_kernel.c |   3 +
 .../net/virtio/virtio_user/virtio_user_dev.c  |   6 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |  11 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 320 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 45 files changed, 1936 insertions(+), 138 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                                 ` (19 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.

This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/release_18_11.rst            | 8 +++++++-
 drivers/bus/pci/linux/pci.c                       | 2 +-
 lib/librte_eal/bsdapp/eal/Makefile                | 2 +-
 lib/librte_eal/bsdapp/eal/eal_memory.c            | 2 ++
 lib/librte_eal/common/eal_common_memory.c         | 5 ++---
 lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++-
 lib/librte_eal/linuxapp/eal/Makefile              | 2 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        | 3 ++-
 lib/librte_eal/linuxapp/eal/eal_memory.c          | 4 +++-
 lib/librte_eal/meson.build                        | 2 +-
 10 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a8327ea77..58bb79022 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -153,6 +153,12 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK:
+         - structure ``rte_memseg_list`` now has a new field indicating length
+           of memory addressed by the segment list
+
+
 Removed Items
 -------------
 
@@ -198,7 +204,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
    + librte_eventdev.so.6
      librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
 static int
 find_max_end_va(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t sz = msl->memseg_arr.len * msl->page_sz;
+	size_t sz = msl->len;
 	void *end_va = RTE_PTR_ADD(msl->base_va, sz);
 	void **max_va = arg;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
 		}
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
+		msl->len = internal_config.memory;
 		msl->socket_id = 0;
 
 		/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
 
 	/* a memseg list was specified, check if it's the right one */
 	start = msl->base_va;
-	end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+	end = RTE_PTR_ADD(start, msl->len);
 
 	if (addr < start || addr >= end)
 		return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
 		msl = &mcfg->memsegs[msl_idx];
 
 		start = msl->base_va;
-		end = RTE_PTR_ADD(start,
-				(size_t)msl->page_sz * msl->memseg_arr.len);
+		end = RTE_PTR_ADD(start, msl->len);
 		if (addr >= start && addr < end)
 			break;
 	}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d2362985 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,9 +30,10 @@ struct rte_memseg_list {
 		uint64_t addr_64;
 		/**< Makes sure addr is always 64-bits */
 	};
-	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	int socket_id; /**< Socket ID for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
+	size_t len; /**< Length of memory area covered by this memseg list. */
 	struct rte_fbarray memseg_arr;
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	int msl_idx, seg_idx, ret, dir_fd = -1;
 
 	start_addr = (uintptr_t) msl->base_va;
-	end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+	end_addr = start_addr + msl->len;
 
 	if ((uintptr_t)wa->ms->addr < start_addr ||
 			(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 		return -1;
 	}
 	local_msl->base_va = primary_msl->base_va;
+	local_msl->len = primary_msl->len;
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
 		return -1;
 	}
 	msl->base_va = addr;
+	msl->len = mem_sz;
 
 	return 0;
 }
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
 		msl->base_va = addr;
 		msl->page_sz = page_sz;
 		msl->socket_id = 0;
+		msl->len = internal_config.memory;
 
 		/* populate memsegs. each memseg is one page long */
 		for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
 		if (msl->memseg_arr.count > 0)
 			continue;
 		/* this is an unused list, deallocate it */
-		mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
 		msl->base_va = NULL;
 
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
                                 ` (18 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        |  8 +++++
 drivers/bus/fslmc/fslmc_vfio.c                |  6 +++-
 drivers/net/mlx5/mlx5.c                       |  4 ++-
 drivers/net/virtio/virtio_user/vhost_kernel.c |  3 ++
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 19 files changed, 119 insertions(+), 38 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 58bb79022..bc1d56130 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -118,6 +118,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+  whether the memseg list is externally allocated. This will have implications
+  for any users of memseg-walk-related functions, as they will now have to skip
+  externally allocated segments in most cases if the intent is to only iterate
+  over internal DPDK memory.
+
 * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
   functions were deprecated since 17.05 and are replaced by
   ``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -157,6 +163,8 @@ ABI Changes
        supporting external memory in DPDK:
          - structure ``rte_memseg_list`` now has a new field indicating length
            of memory addressed by the segment list
+         - structure ``rte_memseg_list`` now has a new flag indicating whether
+           the memseg list refers to external memory
 
 
 Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 
 static int
 fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+		const struct rte_memseg *ms, void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	/* if IOVA address is invalid, skip */
+	if (ms->iova == RTE_BAD_IOVA)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
 	void *start_addr;
 	uint64_t len;
 
+	if (msl->external)
+		return 0;
+
 	if (vm->nregions >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d2362985..645288b02 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -34,6 +34,7 @@ struct rte_memseg_list {
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	size_t len; /**< Length of memory area covered by this memseg list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	struct rte_fbarray memseg_arr;
 };
 
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (2 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                                 ` (17 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.

This breaks the ABI, so document the changes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 doc/guides/rel_notes/release_18_11.rst        |   5 +-
 .../common/include/rte_eal_memconfig.h        |   4 +-
 .../common/include/rte_malloc_heap.h          |   1 +
 lib/librte_eal/common/malloc_heap.c           | 102 +++++++++++++-----
 lib/librte_eal/common/malloc_heap.h           |   3 +
 lib/librte_eal/common/rte_malloc.c            |  41 ++++---
 8 files changed, 114 insertions(+), 44 deletions(-)

diff --git a/config/common_base b/config/common_base
index acc5211bc..83350e0b1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
 CONFIG_RTE_LIBRTE_EAL=y
 CONFIG_RTE_MAX_LCORE=128
 CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
 CONFIG_RTE_MAX_MEMSEG_LISTS=64
 # each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
 # or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 20c58dff1..816e6f879 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
 #define RTE_BUILD_SHARED_LIB
 
 /* EAL defines */
+#define RTE_MAX_HEAPS 32
 #define RTE_MAX_MEMSEG_LISTS 128
 #define RTE_MAX_MEMSEG_PER_LIST 8192
 #define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc1d56130..0607a3980 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -165,7 +165,10 @@ ABI Changes
            of memory addressed by the segment list
          - structure ``rte_memseg_list`` now has a new flag indicating whether
            the memseg list refers to external memory
-
+         - structure ``rte_malloc_heap`` now has a new field indicating socket
+           ID the malloc heap belongs to
+         - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+           resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 645288b02..7634bff5d 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
 
 	struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
 
-	/* Heaps of Malloc per socket */
-	struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+	/* Heaps of Malloc */
+	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..d432cef88 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -26,6 +26,7 @@ struct malloc_heap {
 	struct malloc_elem *volatile last;
 
 	unsigned alloc_count;
+	unsigned int socket_id;
 	size_t total_size;
 } __rte_cache_aligned;
 
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 	return check_flag & flags;
 }
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (heap->socket_id == socket_id)
+			return i;
+	}
+	return -1;
+}
+
 /*
  * Expand the heap with a memory area.
  */
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct rte_memseg_list *found_msl;
 	struct malloc_heap *heap;
-	int msl_idx;
+	int msl_idx, heap_idx;
 
 	if (msl->external)
 		return 0;
 
-	heap = &mcfg->malloc_heaps[msl->socket_id];
+	heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+	if (heap_idx < 0) {
+		RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+		return -1;
+	}
+	heap = &mcfg->malloc_heaps[heap_idx];
 
 	/* msl is const, so find it */
 	msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
+	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
 
 /* this will try lower page sizes first */
 static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
-		unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+		unsigned int heap_id, unsigned int flags, size_t align,
+		size_t bound, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+	int socket_id;
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
 	 * we may still be able to allocate memory from appropriate page sizes,
 	 * we just need to request more memory first.
 	 */
+
+	socket_id = rte_socket_id_by_idx(heap_id);
+	/*
+	 * if socket ID is negative, we cannot find a socket ID for this heap -
+	 * which means it's an external heap. those can have unexpected page
+	 * sizes, so if the user asked to allocate from there - assume user
+	 * knows what they're doing, and allow allocating from there with any
+	 * page size flags.
+	 */
+	if (socket_id < 0)
+		size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
 	ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
 	if (ret != NULL)
 		goto alloc_unlock;
 
-	if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
-			contig)) {
+	/* if socket ID is invalid, this is an external heap */
+	if (socket_id < 0)
+		goto alloc_unlock;
+
+	if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+			bound, contig)) {
 		ret = heap_alloc(heap, type, size, flags, align, bound, contig);
 
 		/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, heap_id, i;
 	void *ret;
 
 	/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
-			contig);
+	ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+			bound, contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
 
-	/* try other heaps */
+	/* try other heaps. we are only iterating through native DPDK sockets,
+	 * so external heaps won't be included.
+	 */
 	for (i = 0; i < (int) rte_socket_count(); i++) {
-		cur_socket = rte_socket_id_by_idx(i);
-		if (cur_socket == socket)
+		if (i == heap_id)
 			continue;
-		ret = heap_alloc_on_socket(type, size, cur_socket, flags,
-				align, bound, contig);
+		ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+				bound, contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 }
 
 static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
-		size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+		unsigned int flags, size_t align, bool contig)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+	struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 	void *ret;
 
 	rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		size_t align, bool contig)
 {
-	int socket, i, cur_socket;
+	int socket, i, cur_socket, heap_id;
 	void *ret;
 
 	/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 	else
 		socket = socket_arg;
 
-	/* Check socket parameter */
-	if (socket >= RTE_MAX_NUMA_NODES)
+	/* turn socket ID into heap ID */
+	heap_id = malloc_socket_to_heap_id(socket);
+	/* if heap id is negative, socket ID was invalid */
+	if (heap_id < 0)
 		return NULL;
 
-	ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+	ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
 			contig);
 	if (ret != NULL || socket_arg != SOCKET_ID_ANY)
 		return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
 		cur_socket = rte_socket_id_by_idx(i);
 		if (cur_socket == socket)
 			continue;
-		ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
-				align, contig);
+		ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+				contig);
 		if (ret != NULL)
 			return ret;
 	}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* ...of which we can't avail if we are in legacy mode, or if this is an
 	 * externally allocated segment.
 	 */
-	if (internal_config.legacy_mem || msl->external)
+	if (internal_config.legacy_mem || (msl->external > 0))
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 int
 malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 }
 
 /*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
  */
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
 void
 malloc_heap_dump(struct malloc_heap *heap, FILE *f);
 
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
 int
 rte_eal_malloc_heap_init(void);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int heap_idx, ret = -1;
 
-	if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
-		return -1;
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
 
-	return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+	heap_idx = malloc_socket_to_heap_id(socket);
+	if (heap_idx < 0)
+		goto unlock;
+
+	ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+			socket_stats);
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
 }
 
 /*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int idx;
 
-	for (idx = 0; idx < rte_socket_count(); idx++) {
-		unsigned int socket = rte_socket_id_by_idx(idx);
-		fprintf(f, "Heap on socket %i:\n", socket);
-		malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		fprintf(f, "Heap id: %u\n", idx);
+		malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
 	}
 
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
 /*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
 void
 rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 {
-	unsigned int socket;
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int heap_id;
 	struct rte_malloc_socket_stats sock_stats;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
 	/* Iterate through all initialised heaps */
-	for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
-		if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
-			continue;
+	for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
 
-		fprintf(f, "Socket:%u\n", socket);
+		malloc_heap_get_stats(heap, &sock_stats);
+
+		fprintf(f, "Heap id:%u\n", heap_id);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
 		fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
 	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return;
 }
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 04/21] mem: do not check for invalid socket ID
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (3 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 05/21] flow_classify: " Anatoly Burakov
                                 ` (16 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 0607a3980..172c42f71 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -123,6 +123,13 @@ API Changes
   for any users of memseg-walk-related functions, as they will now have to skip
   externally allocated segments in most cases if the intent is to only iterate
   over internal DPDK memory.
+  ``socket_id`` parameter across the entire DPDK has gained additional meaning,
+  as some socket ID's will now be representing externally allocated memory. No
+  changes will be required for existing code as backwards compatibility will be
+  kept, and those who do not use this feature will not see these extra socket
+  ID's. Any new API's must not check socket ID parameters themselves, and must
+  instead leave it to the memory subsystem to decide whether socket ID is a
+  valid one.
 
 * mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
   functions were deprecated since 17.05 and are replaced by
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a9cfa423f..09b06061d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 05/21] flow_classify: do not check for invalid socket ID
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (4 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 06/21] pipeline: " Anatoly Burakov
                                 ` (15 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Bernard Iremonger, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
---
 lib/librte_flow_classify/rte_flow_classify.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
index 4c3469da1..fb652a2b7 100644
--- a/lib/librte_flow_classify/rte_flow_classify.c
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -247,8 +247,7 @@ rte_flow_classifier_check_params(struct rte_flow_classifier_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_FLOW_CLASSIFY_LOG(ERR,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 06/21] pipeline: do not check for invalid socket ID
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (5 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 05/21] flow_classify: " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 07/21] sched: " Anatoly Burakov
                                 ` (14 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_pipeline/rte_pipeline.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_pipeline/rte_pipeline.c b/lib/librte_pipeline/rte_pipeline.c
index 0cb8b804e..2c047a8a4 100644
--- a/lib/librte_pipeline/rte_pipeline.c
+++ b/lib/librte_pipeline/rte_pipeline.c
@@ -178,8 +178,7 @@ rte_pipeline_check_params(struct rte_pipeline_params *params)
 	}
 
 	/* socket */
-	if ((params->socket_id < 0) ||
-	    (params->socket_id >= RTE_MAX_NUMA_NODES)) {
+	if (params->socket_id < 0) {
 		RTE_LOG(ERR, PIPELINE,
 			"%s: Incorrect value for parameter socket_id\n",
 			__func__);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 07/21] sched: do not check for invalid socket ID
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (6 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 06/21] pipeline: " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
                                 ` (13 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Cristian Dumitrescu, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 lib/librte_sched/rte_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_sched/rte_sched.c b/lib/librte_sched/rte_sched.c
index 9269e5c71..d4e2189c7 100644
--- a/lib/librte_sched/rte_sched.c
+++ b/lib/librte_sched/rte_sched.c
@@ -329,7 +329,7 @@ rte_sched_port_check_params(struct rte_sched_port_params *params)
 		return -1;
 
 	/* socket */
-	if ((params->socket < 0) || (params->socket >= RTE_MAX_NUMA_NODES))
+	if (params->socket < 0)
 		return -3;
 
 	/* rate */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (7 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 07/21] sched: " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
                                 ` (12 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  2 ++
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 172c42f71..754c41755 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -176,6 +176,8 @@ ABI Changes
            ID the malloc heap belongs to
          - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
            resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+         - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d432cef88..4a7e0eb1d 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	unsigned int socket_id;
 	size_t total_size;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1024,6 +1023,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 09/21] malloc: add function to query socket ID of named heap
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (8 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 10/21] malloc: add function to check if socket is external Anatoly Burakov
                                 ` (11 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

When we will be creating external heaps, they will have their own
"fake" socket ID, so add a function that will map the heap name
to its socket ID.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 14 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 37 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index a9fb7e452..8870732a6 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,20 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Find socket ID corresponding to a named heap.
+ *
+ * @param name
+ *   Heap name to find socket ID for
+ * @return
+ *   Socket ID in case of success (a non-negative number)
+ *   -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``name`` was NULL
+ *     ENOENT - heap identified by the name ``name`` was not found
+ */
+int __rte_experimental
+rte_malloc_heap_get_socket(const char *name);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72632da56..b807dfe09 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <sys/queue.h>
 
+#include <rte_errno.h>
 #include <rte_memcpy.h>
 #include <rte_memory.h>
 #include <rte_eal.h>
@@ -183,6 +184,42 @@ rte_malloc_dump_heaps(FILE *f)
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 }
 
+int
+rte_malloc_heap_get_socket(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int idx;
+	int ret;
+
+	if (name == NULL ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if (!strncmp(name, tmp->name, RTE_HEAP_NAME_MAX_LEN)) {
+			heap = tmp;
+			break;
+		}
+	}
+
+	if (heap != NULL) {
+		ret = heap->socket_id;
+	} else {
+		rte_errno = ENOENT;
+		ret = -1;
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bbb0..d8f9665b8 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_get_socket;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 10/21] malloc: add function to check if socket is external
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (9 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps Anatoly Burakov
                                 ` (10 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, alejandro.lucero

An API is needed to check whether a particular socket ID belongs
to an internal or external heap. Prime user of this would be
mempool allocator, because normal assumptions of IOVA
contiguousness in IOVA as VA mode do not hold in case of
externally allocated memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 15 +++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 25 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 lib/librte_mempool/rte_mempool.c           | 22 ++++++++++++++++---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 8870732a6..403271ddc 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -277,6 +277,21 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_get_socket(const char *name);
 
+/**
+ * Check if a given socket ID refers to externally allocated memory.
+ *
+ * @note Passing SOCKET_ID_ANY will return 0.
+ *
+ * @param socket_id
+ *   Socket ID to check
+ * @return
+ *   1 if socket ID refers to externally allocated memory
+ *   0 if socket ID refers to internal DPDK memory
+ *   -1 if socket ID is invalid
+ */
+int __rte_experimental
+rte_malloc_heap_socket_is_external(int socket_id);
+
 /**
  * Dump statistics.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b807dfe09..fa81d7862 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -220,6 +220,31 @@ rte_malloc_heap_get_socket(const char *name)
 	return ret;
 }
 
+int
+rte_malloc_heap_socket_is_external(int socket_id)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int idx;
+	int ret = -1;
+
+	if (socket_id == SOCKET_ID_ANY)
+		return 0;
+
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+	for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[idx];
+
+		if ((int)tmp->socket_id == socket_id) {
+			/* external memory always has large socket ID's */
+			ret = tmp->socket_id >= RTE_MAX_NUMA_NODES;
+			break;
+		}
+	}
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 /*
  * Print stats on memory type. If type is NULL, info on all types is printed
  */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d8f9665b8..bd60506af 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
 	rte_mem_event_callback_register;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2ed539f01..683b216f9 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -428,12 +428,18 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	rte_iova_t iova;
 	unsigned mz_id, n;
 	int ret;
-	bool no_contig, try_contig, no_pageshift;
+	bool no_contig, try_contig, no_pageshift, external;
 
 	ret = mempool_ops_alloc_once(mp);
 	if (ret != 0)
 		return ret;
 
+	/* check if we can retrieve a valid socket ID */
+	ret = rte_malloc_heap_socket_is_external(mp->socket_id);
+	if (ret < 0)
+		return -EINVAL;
+	external = ret;
+
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
@@ -481,9 +487,19 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	 * in one contiguous chunk as well (otherwise we might end up wasting a
 	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
 	 * memory, then we'll go and reserve space page-by-page.
+	 *
+	 * We also have to take into account the fact that memory that we're
+	 * going to allocate from can belong to an externally allocated memory
+	 * area, in which case the assumption of IOVA as VA mode being
+	 * synonymous with IOVA contiguousness will not hold. We should also try
+	 * to go for contiguous memory even if we're in no-huge mode, because
+	 * external memory may in fact be IOVA-contiguous.
 	 */
-	no_pageshift = no_contig || rte_eal_iova_mode() == RTE_IOVA_VA;
-	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+	external = rte_malloc_heap_socket_is_external(mp->socket_id) == 1;
+	no_pageshift = no_contig ||
+			(!external && rte_eal_iova_mode() == RTE_IOVA_VA);
+	try_contig = !no_contig && !no_pageshift &&
+			(rte_eal_has_hugepages() || external);
 
 	if (no_pageshift) {
 		pg_sz = 0;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (10 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 10/21] malloc: add function to check if socket is external Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 12/21] malloc: allow destroying heaps Anatoly Burakov
                                 ` (9 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

This breaks the ABI, so document the change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 754c41755..e7674adb9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -177,6 +177,8 @@ ABI Changes
          - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
            resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
          - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+         - structure ``rte_eal_memconfig`` has been extended to contain next
+           socket ID for externally allocated segments
 
 
 Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 7634bff5d..fc44c4e5f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 12/21] malloc: allow destroying heaps
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (11 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
                                 ` (8 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to destroy specified heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 23 +++++++++
 lib/librte_eal/common/malloc_heap.c        | 22 ++++++++
 lib/librte_eal/common/malloc_heap.h        |  3 ++
 lib/librte_eal/common/rte_malloc.c         | 58 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 107 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index e326529d0..309bbbcc9 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -282,6 +282,29 @@ rte_malloc_get_socket_stats(int socket,
 int __rte_experimental
 rte_malloc_heap_create(const char *heap_name);
 
+/**
+ * Destroys a previously created malloc heap with specified name.
+ *
+ * @note This function will return a failure result if not all memory allocated
+ *   from the heap has been freed back to the heap
+ *
+ * @note This function will return a failure result if not all memory segments
+ *   were removed from the heap prior to its destruction
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     ENOENT - heap by the name of ``heap_name`` was not found
+ *     EPERM  - attempting to destroy reserved heap
+ *     EBUSY  - heap still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_destroy(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 00fdf54f7..ca774c96f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1053,6 +1053,28 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 	return 0;
 }
 
+int
+malloc_heap_destroy(struct malloc_heap *heap)
+{
+	if (heap->alloc_count != 0) {
+		RTE_LOG(ERR, EAL, "Heap is still in use\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->first != NULL || heap->last != NULL) {
+		RTE_LOG(ERR, EAL, "Heap still contains memory segments\n");
+		rte_errno = EBUSY;
+		return -1;
+	}
+	if (heap->total_size != 0)
+		RTE_LOG(ERR, EAL, "Total size not zero, heap is likely corrupt\n");
+
+	/* after this, the lock will be dropped */
+	memset(heap, 0, sizeof(*heap));
+
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index eebee16dc..75278da3c 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -36,6 +36,9 @@ malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 
+int
+malloc_heap_destroy(struct malloc_heap *heap);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 25967a7cb..286e748ef 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -313,6 +313,21 @@ rte_malloc_virt2iova(const void *addr)
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
 
+static struct malloc_heap *
+find_named_heap(const char *name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+		if (!strncmp(name, heap->name, RTE_HEAP_NAME_MAX_LEN))
+			return heap;
+	}
+	return NULL;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
@@ -363,3 +378,46 @@ rte_malloc_heap_create(const char *heap_name)
 
 	return ret;
 }
+
+int
+rte_malloc_heap_destroy(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* start from non-socket heaps */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Heap %s not found\n", heap_name);
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to destroy internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	/* sanity checks done, now we can destroy the heap */
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_destroy(heap);
+
+	/* if we failed, lock is still active */
+	if (ret < 0)
+		rte_spinlock_unlock(&heap->lock);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 376f33bbb..27aac5bea 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
 	rte_malloc_heap_create;
+	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 13/21] malloc: allow adding memory to named heaps
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (12 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 12/21] malloc: allow destroying heaps Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 14/21] malloc: allow removing memory from " Anatoly Burakov
                                 ` (7 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to add externally allocated memory to malloc heap. The
memory will be stored in memseg lists like regular DPDK memory.
Multiple segments are allowed within a heap. If IOVA table is
not provided, IOVA addresses are filled in with RTE_BAD_IOVA.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 39 ++++++++++++
 lib/librte_eal/common/malloc_heap.c        | 74 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 51 +++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 169 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 309bbbcc9..fb5b6e2f7 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,45 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Add memory chunk to a heap with specified name.
+ *
+ * @note Multiple memory chunks can be added to the same heap
+ *
+ * @note Memory must be previously allocated for DPDK to be able to use it as a
+ *   malloc heap. Failing to do so will result in undefined behavior, up to and
+ *   including segmentation faults.
+ *
+ * @note Calling this function will erase any contents already present at the
+ *   supplied memory address.
+ *
+ * @param heap_name
+ *   Name of the heap to add memory chunk to
+ * @param va_addr
+ *   Start of virtual area to add to the heap
+ * @param len
+ *   Length of virtual area to add to the heap
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to add memory to a reserved heap
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ca774c96f..256c25edf 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,80 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	char fbarray_name[RTE_FBARRAY_NAME_LEN];
+	struct rte_memseg_list *msl = NULL;
+	struct rte_fbarray *arr;
+	size_t seg_len = n_pages * page_sz;
+	unsigned int i;
+
+	/* first, find a free memseg list */
+	for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) {
+		struct rte_memseg_list *tmp = &mcfg->memsegs[i];
+		if (tmp->base_va == NULL) {
+			msl = tmp;
+			break;
+		}
+	}
+	if (msl == NULL) {
+		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
+			heap->name, va_addr);
+
+	/* create the backing fbarray */
+	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
+			sizeof(struct rte_memseg)) < 0) {
+		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
+		return -1;
+	}
+	arr = &msl->memseg_arr;
+
+	/* fbarray created, fill it up */
+	for (i = 0; i < n_pages; i++) {
+		struct rte_memseg *ms;
+
+		rte_fbarray_set_used(arr, i);
+		ms = rte_fbarray_get(arr, i);
+		ms->addr = RTE_PTR_ADD(va_addr, i * page_sz);
+		ms->iova = iova_addrs == NULL ? RTE_BAD_IOVA : iova_addrs[i];
+		ms->hugepage_sz = page_sz;
+		ms->len = page_sz;
+		ms->nchannel = rte_memory_get_nchannel();
+		ms->nrank = rte_memory_get_nrank();
+		ms->socket_id = heap->socket_id;
+	}
+
+	/* set up the memseg list */
+	msl->base_va = va_addr;
+	msl->page_sz = page_sz;
+	msl->socket_id = heap->socket_id;
+	msl->len = seg_len;
+	msl->version = 0;
+	msl->external = 1;
+
+	/* erase contents of new memory */
+	memset(va_addr, 0, seg_len);
+
+	/* now, add newly minted memory to the malloc heap */
+	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+
+	heap->total_size += seg_len;
+
+	/* all done! */
+	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
+			heap->name, va_addr);
+
+	return 0;
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 75278da3c..237ce9dc2 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,6 +39,10 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 286e748ef..acdbd92a2 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -328,6 +328,57 @@ find_named_heap(const char *name)
 	return NULL;
 }
 
+int
+rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	unsigned int n;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL ||
+			page_sz == 0 || !rte_is_power_of_2(page_sz) ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot add memory to internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+	n = len / page_sz;
+	if (n != n_pages && iova_addrs != NULL) {
+		rte_errno = EINVAL;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
+			page_sz);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 27aac5bea..02254042c 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,6 +321,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_create;
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
+	rte_malloc_heap_memory_add;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 14/21] malloc: allow removing memory from named heaps
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (13 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
                                 ` (6 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add an API to remove memory from specified heaps. This will first
check if all elements within the region are free, and that the
region is the original region that was added to the heap (by
comparing its length to length of memory addressed by the
underlying memseg list).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++
 lib/librte_eal/common/malloc_heap.c        | 54 ++++++++++++++++++++++
 lib/librte_eal/common/malloc_heap.h        |  4 ++
 lib/librte_eal/common/rte_malloc.c         | 39 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 5 files changed, 125 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index fb5b6e2f7..40bae4478 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -302,6 +302,33 @@ int __rte_experimental
 rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+/**
+ * Remove memory chunk from heap with specified name.
+ *
+ * @note Memory chunk being removed must be the same as one that was added;
+ *   partially removing memory chunks is not supported
+ *
+ * @note Memory area must not contain any allocated elements to allow its
+ *   removal from the heap
+ *
+ * @param heap_name
+ *   Name of the heap to remove memory from
+ * @param va_addr
+ *   Virtual address to remove from the heap
+ * @param len
+ *   Length of virtual area to remove from the heap
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to remove memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ *     EBUSY  - memory chunk still contains data
+ */
+int __rte_experimental
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 256c25edf..adc1669aa 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1023,6 +1023,32 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+static int
+destroy_seg(struct malloc_elem *elem, size_t len)
+{
+	struct malloc_heap *heap = elem->heap;
+	struct rte_memseg_list *msl;
+
+	msl = elem->msl;
+
+	/* this element can be removed */
+	malloc_elem_free_list_remove(elem);
+	malloc_elem_hide_region(elem, elem, len);
+
+	heap->total_size -= len;
+
+	memset(elem, 0, sizeof(*elem));
+
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
@@ -1097,6 +1123,34 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	return 0;
 }
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len)
+{
+	struct malloc_elem *elem = heap->first;
+
+	/* find element with specified va address */
+	while (elem != NULL && elem != va_addr) {
+		elem = elem->next;
+		/* stop if we've blown past our VA */
+		if (elem > (struct malloc_elem *)va_addr) {
+			rte_errno = ENOENT;
+			return -1;
+		}
+	}
+	/* check if element was found */
+	if (elem == NULL || elem->msl->len != len) {
+		rte_errno = ENOENT;
+		return -1;
+	}
+	/* if element's size is not equal to segment len, segment is busy */
+	if (elem->state == ELEM_BUSY || elem->size != len) {
+		rte_errno = EBUSY;
+		return -1;
+	}
+	return destroy_seg(elem, len);
+}
+
 int
 malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
 {
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 237ce9dc2..e48996d52 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -43,6 +43,10 @@ int
 malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
 
+int
+malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
+		size_t len);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index acdbd92a2..bfc49d0b7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -379,6 +379,45 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		/* cannot remove memory from internal heaps */
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	rte_spinlock_lock(&heap->lock);
+	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
+	rte_spinlock_unlock(&heap->lock);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 02254042c..8c66d0be9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
 	rte_mem_alloc_validator_unregister;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 15/21] malloc: allow attaching to external memory chunks
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (14 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 14/21] malloc: allow removing memory from " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 16/21] malloc: allow detaching from external memory Anatoly Burakov
                                 ` (5 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

In order to use external memory in multiple processes, we need to
attach to primary process's memseg lists, so add a new API to do
that. It is the responsibility of the user to ensure that memory
is accessible and that it has been previously added to the malloc
heap by another process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 28 ++++++++
 lib/librte_eal/common/rte_malloc.c         | 83 ++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 112 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 40bae4478..793f9473a 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -268,6 +268,10 @@ rte_malloc_get_socket_stats(int socket,
  *
  * @note Multiple memory chunks can be added to the same heap
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling
+ *   ``rte_malloc_heap_memory_attach`` in each other process.
+ *
  * @note Memory must be previously allocated for DPDK to be able to use it as a
  *   malloc heap. Failing to do so will result in undefined behavior, up to and
  *   including segmentation faults.
@@ -329,6 +333,30 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 int __rte_experimental
 rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Attach to an already existing chunk of external memory in another process.
+ *
+ * @note This function must be called before any attempt is made to use an
+ *   already existing external memory chunk. This function does *not* need to
+ *   be called if a call to ``rte_malloc_heap_memory_add`` was made in the
+ *   current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful attach
+ *   -1 on unsuccessful attach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to attach memory to a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index bfc49d0b7..5078235b1 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -418,6 +418,89 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+struct sync_mem_walk_arg {
+	void *va_addr;
+	size_t len;
+	int result;
+};
+
+static int
+attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct sync_mem_walk_arg *wa = arg;
+	size_t len = msl->page_sz * msl->memseg_arr.len;
+
+	if (msl->base_va == wa->va_addr &&
+			len == wa->len) {
+		struct rte_memseg_list *found_msl;
+		int msl_idx, ret;
+
+		/* msl is const */
+		msl_idx = msl - mcfg->memsegs;
+		found_msl = &mcfg->memsegs[msl_idx];
+
+		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+
+		if (ret < 0)
+			wa->result = -rte_errno;
+		else
+			wa->result = 0;
+		return 1;
+	}
+	return 0;
+}
+
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	struct sync_mem_walk_arg wa;
+	int ret;
+
+	if (heap_name == NULL || va_addr == NULL || len == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our heap */
+	heap = find_named_heap(heap_name);
+	if (heap == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	/* we shouldn't be able to attach to internal heaps */
+	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
+		rte_errno = EPERM;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* find corresponding memseg list to attach to */
+	wa.va_addr = va_addr;
+	wa.len = len;
+	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+
+	/* we're already holding a read lock */
+	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+
+	if (wa.result < 0) {
+		rte_errno = -wa.result;
+		ret = -1;
+	} else {
+		ret = 0;
+	}
+unlock:
+	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 8c66d0be9..920852042 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -322,6 +322,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_destroy;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
+	rte_malloc_heap_memory_attach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 16/21] malloc: allow detaching from external memory
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (15 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 17/21] malloc: enable event callbacks for " Anatoly Burakov
                                 ` (4 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to detach from existing chunk of external memory in a
process.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/include/rte_malloc.h | 27 +++++++++++++++++++
 lib/librte_eal/common/rte_malloc.c         | 31 +++++++++++++++++-----
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 793f9473a..7249e6aae 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -315,6 +315,9 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
  * @note Memory area must not contain any allocated elements to allow its
  *   removal from the heap
  *
+ * @note All other processes must detach from the memory chunk prior to it being
+ *   removed from the heap.
+ *
  * @param heap_name
  *   Name of the heap to remove memory from
  * @param va_addr
@@ -357,6 +360,30 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len);
 int __rte_experimental
 rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len);
 
+/**
+ * Detach from a chunk of external memory in secondary process.
+ *
+ * @note This function must be called in before any attempt is made to remove
+ *   external memory from the heap in another process. This function does *not*
+ *   need to be called if a call to ``rte_malloc_heap_memory_remove`` will be
+ *   called in current process.
+ *
+ * @param heap_name
+ *   Heap name to which this chunk of memory belongs
+ * @param va_addr
+ *   Start address of memory chunk to attach to
+ * @param len
+ *   Length of memory chunk to attach to
+ * @return
+ *   0 on successful detach
+ *   -1 on unsuccessful detach, with rte_errno set to indicate cause for error:
+ *     EINVAL - one of the parameters was invalid
+ *     EPERM  - attempted to detach memory from a reserved heap
+ *     ENOENT - heap or memory chunk was not found
+ */
+int __rte_experimental
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len);
+
 /**
  * Creates a new empty malloc heap with a specified name.
  *
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 5078235b1..72e42b337 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -422,10 +422,11 @@ struct sync_mem_walk_arg {
 	void *va_addr;
 	size_t len;
 	int result;
+	bool attach;
 };
 
 static int
-attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
+sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct sync_mem_walk_arg *wa = arg;
@@ -440,7 +441,10 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		if (wa->attach)
+			ret = rte_fbarray_attach(&found_msl->memseg_arr);
+		else
+			ret = rte_fbarray_detach(&found_msl->memseg_arr);
 
 		if (ret < 0)
 			wa->result = -rte_errno;
@@ -451,8 +455,8 @@ attach_mem_walk(const struct rte_memseg_list *msl, void *arg)
 	return 0;
 }
 
-int
-rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+static int
+sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
@@ -475,20 +479,21 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 		ret = -1;
 		goto unlock;
 	}
-	/* we shouldn't be able to attach to internal heaps */
+	/* we shouldn't be able to sync to internal heaps */
 	if (heap->socket_id < RTE_MAX_NUMA_NODES) {
 		rte_errno = EPERM;
 		ret = -1;
 		goto unlock;
 	}
 
-	/* find corresponding memseg list to attach to */
+	/* find corresponding memseg list to sync to */
 	wa.va_addr = va_addr;
 	wa.len = len;
 	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
+	wa.attach = attach;
 
 	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(attach_mem_walk, &wa);
+	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
 
 	if (wa.result < 0) {
 		rte_errno = -wa.result;
@@ -501,6 +506,18 @@ rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
+int
+rte_malloc_heap_memory_attach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, true);
+}
+
+int
+rte_malloc_heap_memory_detach(const char *heap_name, void *va_addr, size_t len)
+{
+	return sync_memory(heap_name, va_addr, len, false);
+}
+
 int
 rte_malloc_heap_create(const char *heap_name)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 920852042..30583eef2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -323,6 +323,7 @@ EXPERIMENTAL {
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_memory_add;
 	rte_malloc_heap_memory_attach;
+	rte_malloc_heap_memory_detach;
 	rte_malloc_heap_memory_remove;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 17/21] malloc: enable event callbacks for external memory
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (16 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 16/21] malloc: allow detaching from external memory Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 18/21] test: add unit tests for external memory support Anatoly Burakov
                                 ` (3 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Hemant Agrawal, Shreyansh Jain, Maxime Coquelin, Tiwei Bie,
	Zhihong Wang, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, thomas, shahafs, arybchenko, alejandro.lucero

When adding or removing external memory from the memory map, there
may be actions that need to be taken on account of this memory (e.g.
DMA mapping). Add support for triggering callbacks when adding,
removing, attaching or detaching external memory.

Some memory event callback handlers will need additional logic to
handle external memory regions. For example, virtio callback has to
completely ignore externally allocated memory, because there is no
way to find file descriptors backing the memory address in a
generic fashion. All other callbacks have also been adjusted to
handle RTE_BAD_IOVA as IOVA address, as this is one of the expected
use cases for external memory support.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++++
 .../net/virtio/virtio_user/virtio_user_dev.c  |  6 +++++
 lib/librte_eal/common/malloc_heap.c           |  7 +++++
 lib/librte_eal/common/rte_malloc.c            | 27 ++++++++++++++++---
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 10 +++++--
 5 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index cb33dd891..493b6e5be 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -221,6 +221,13 @@ fslmc_memevent_cb(enum rte_mem_event type, const void *addr, size_t len,
 					"alloc" : "dealloc",
 				va, virt_addr, iova_addr, map_len);
 
+		/* iova_addr may be set to RTE_BAD_IOVA */
+		if (iova_addr == RTE_BAD_IOVA) {
+			DPAA2_BUS_DEBUG("Segment has invalid iova, skipping\n");
+			cur_len += map_len;
+			continue;
+		}
+
 		if (type == RTE_MEM_EVENT_ALLOC)
 			ret = fslmc_map_dma(virt_addr, iova_addr, map_len);
 		else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 55a82e4b0..a185aed34 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -301,8 +301,14 @@ virtio_user_mem_event_cb(enum rte_mem_event type __rte_unused,
 						 void *arg)
 {
 	struct virtio_user_dev *dev = arg;
+	struct rte_memseg_list *msl;
 	uint16_t i;
 
+	/* ignore externally allocated memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl->external)
+		return;
+
 	pthread_mutex_lock(&dev->mutex);
 
 	if (dev->started == false)
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index adc1669aa..08ec75377 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1031,6 +1031,9 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	msl = elem->msl;
 
+	/* notify all subscribers that a memory area is going to be removed */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
+
 	/* this element can be removed */
 	malloc_elem_free_list_remove(elem);
 	malloc_elem_hide_region(elem, elem, len);
@@ -1120,6 +1123,10 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
 			heap->name, va_addr);
 
+	/* notify all subscribers that a new memory area has been added */
+	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+			va_addr, seg_len);
+
 	return 0;
 }
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 72e42b337..2c19c2f87 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -25,6 +25,7 @@
 #include <rte_malloc.h>
 #include "malloc_elem.h"
 #include "malloc_heap.h"
+#include "eal_memalloc.h"
 
 
 /* Free the memory space back to heap */
@@ -441,15 +442,29 @@ sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
 		msl_idx = msl - mcfg->memsegs;
 		found_msl = &mcfg->memsegs[msl_idx];
 
-		if (wa->attach)
+		if (wa->attach) {
 			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		else
+		} else {
+			/* notify all subscribers that a memory area is about to
+			 * be removed
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+					msl->base_va, msl->len);
 			ret = rte_fbarray_detach(&found_msl->memseg_arr);
+		}
 
-		if (ret < 0)
+		if (ret < 0) {
 			wa->result = -rte_errno;
-		else
+		} else {
+			/* notify all subscribers that a new memory area was
+			 * added
+			 */
+			if (wa->attach)
+				eal_memalloc_mem_event_notify(
+						RTE_MEM_EVENT_ALLOC,
+						msl->base_va, msl->len);
 			wa->result = 0;
+		}
 		return 1;
 	}
 	return 0;
@@ -499,6 +514,10 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 		rte_errno = -wa.result;
 		ret = -1;
 	} else {
+		/* notify all subscribers that a new memory area was added */
+		if (attach)
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
 		ret = 0;
 	}
 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index fddbc3b54..d7268e4ce 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -509,7 +509,7 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	msl = rte_mem_virt2memseg_list(addr);
 
 	/* for IOVA as VA mode, no need to care for IOVA addresses */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0) {
 		uint64_t vfio_va = (uint64_t)(uintptr_t)addr;
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, vfio_va, vfio_va,
@@ -523,13 +523,19 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len,
 	/* memsegs are contiguous in memory */
 	ms = rte_mem_virt2memseg(addr, msl);
 	while (cur_len < len) {
+		/* some memory segments may have invalid IOVA */
+		if (ms->iova == RTE_BAD_IOVA) {
+			RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n",
+					ms->addr);
+			goto next;
+		}
 		if (type == RTE_MEM_EVENT_ALLOC)
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 1);
 		else
 			vfio_dma_mem_map(default_vfio_cfg, ms->addr_64,
 					ms->iova, ms->len, 0);
-
+next:
 		cur_len += ms->len;
 		++ms;
 	}
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 18/21] test: add unit tests for external memory support
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (17 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 17/21] malloc: enable event callbacks for " Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory Anatoly Burakov
                                 ` (2 subsequent siblings)
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add simple unit tests to test external memory support.
The tests are pretty basic and mostly consist of checking
if invalid API calls are handled correctly, plus a simple
allocation/deallocation test for malloc and memzone.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 test/test/Makefile            |   1 +
 test/test/autotest_data.py    |  14 +-
 test/test/meson.build         |   1 +
 test/test/test_external_mem.c | 389 ++++++++++++++++++++++++++++++++++
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 test/test/test_external_mem.c

diff --git a/test/test/Makefile b/test/test/Makefile
index dcea4410d..5d8b1dcb0 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -71,6 +71,7 @@ SRCS-y += test_bitmap.c
 SRCS-y += test_reciprocal_division.c
 SRCS-y += test_reciprocal_division_perf.c
 SRCS-y += test_fbarray.c
+SRCS-y += test_external_mem.c
 
 SRCS-y += test_ring.c
 SRCS-y += test_ring_perf.c
diff --git a/test/test/autotest_data.py b/test/test/autotest_data.py
index f68d9b111..51f8e1689 100644
--- a/test/test/autotest_data.py
+++ b/test/test/autotest_data.py
@@ -477,10 +477,16 @@
         "Report":  None,
     },
     {
-        "Name":    "Fbarray autotest",
-        "Command": "fbarray_autotest",
-        "Func":    default_autotest,
-        "Report":  None,
+	"Name":    "Fbarray autotest",
+	"Command": "fbarray_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
+    },
+    {
+	"Name":    "External memory autotest",
+	"Command": "external_mem_autotest",
+	"Func":    default_autotest,
+	"Report":  None,
     },
     #
     #Please always keep all dump tests at the end and together!
diff --git a/test/test/meson.build b/test/test/meson.build
index bacb5b144..6a71ee0d3 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -164,6 +164,7 @@ test_names = [
 	'eventdev_common_autotest',
 	'eventdev_octeontx_autotest',
 	'eventdev_sw_autotest',
+	'external_mem_autotest',
 	'func_reentrancy_autotest',
 	'flow_classify_autotest',
 	'hash_scaling_autotest',
diff --git a/test/test/test_external_mem.c b/test/test/test_external_mem.c
new file mode 100644
index 000000000..d0837aa35
--- /dev/null
+++ b/test/test/test_external_mem.c
@@ -0,0 +1,389 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+
+#include <rte_common.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_malloc.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+
+#include "test.h"
+
+#define EXTERNAL_MEM_SZ (RTE_PGSIZE_4K << 10) /* 4M of data */
+
+static int
+test_invalid_param(void *addr, size_t len, size_t pgsz, rte_iova_t *iova,
+		int n_pages)
+{
+	static const char * const names[] = {
+		NULL, /* NULL name */
+		"",   /* empty name */
+		"this heap name is definitely way too long to be valid"
+	};
+	const char *valid_name = "valid heap name";
+	unsigned int i;
+
+	/* check invalid name handling */
+	for (i = 0; i < RTE_DIM(names); i++) {
+		const char *name = names[i];
+
+		/* these calls may fail for other reasons, so check errno */
+		if (rte_malloc_heap_create(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Created heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_destroy(name) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Destroyed heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_get_socket(name) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Found socket for heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_add(name, addr, len,
+				NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+			printf("%s():%i: Added memory to heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_remove(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Removed memory from heap with invalid name\n",
+					__func__, __LINE__);
+			goto fail;
+		}
+
+		if (rte_malloc_heap_memory_attach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Attached memory to heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (rte_malloc_heap_memory_detach(name, addr, len) >= 0 ||
+				rte_errno != EINVAL) {
+			printf("%s():%i: Detached memory from heap with invalid name\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* do same as above, but with a valid heap name */
+
+	/* skip create call */
+	if (rte_malloc_heap_destroy(valid_name) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Destroyed heap with invalid name\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_get_socket(valid_name) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Found socket for heap with invalid name\n",
+				__func__, __LINE__);
+		goto fail;
+	}
+
+	/* these calls may fail for other reasons, so check errno */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != ENOENT) {
+		printf("%s():%i: Added memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_remove(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Removed memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Attached memory to non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, len) >= 0 ||
+			rte_errno != ENOENT) {
+		printf("%s():%i: Detached memory from non-existent heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* create a valid heap but test other invalid parameters */
+	if (rte_malloc_heap_create(valid_name) != 0) {
+		printf("%s():%i: Failed to create valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero length */
+	if (rte_malloc_heap_memory_add(valid_name, addr, 0,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, addr, 0) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* zero address */
+	if (rte_malloc_heap_memory_add(valid_name, NULL, len,
+			NULL, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_remove(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Removed memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	if (rte_malloc_heap_memory_attach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Attached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_detach(valid_name, NULL, len) >= 0 ||
+			rte_errno != EINVAL) {
+		printf("%s():%i: Detached memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* wrong page count */
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, 0, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages - 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_memory_add(valid_name, addr, len,
+			iova, n_pages + 1, pgsz) >= 0 || rte_errno != EINVAL) {
+		printf("%s():%i: Added memory with invalid parameters\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* tests passed, destroy heap */
+	if (rte_malloc_heap_destroy(valid_name) != 0) {
+		printf("%s():%i: Failed to destroy valid heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	return 0;
+fail:
+	rte_malloc_heap_destroy(valid_name);
+	return -1;
+}
+
+static int
+test_basic(void *addr, size_t len, size_t pgsz, rte_iova_t *iova, int n_pages)
+{
+	const char *heap_name = "heap";
+	void *ptr = NULL;
+	int socket_id, i;
+	const struct rte_memzone *mz = NULL;
+
+	/* create heap */
+	if (rte_malloc_heap_create(heap_name) != 0) {
+		printf("%s():%i: Failed to create malloc heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* get socket ID corresponding to this heap */
+	socket_id = rte_malloc_heap_get_socket(heap_name);
+	if (socket_id < 0) {
+		printf("%s():%i: cannot find socket for external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* heap is empty, so any allocation should fail */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr != NULL) {
+		printf("%s():%i: Allocated from empty heap\n", __func__,
+			__LINE__);
+		goto fail;
+	}
+
+	/* add memory to heap */
+	if (rte_malloc_heap_memory_add(heap_name, addr, len,
+			iova, n_pages, pgsz) != 0) {
+		printf("%s():%i: Failed to add memory to heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check that we can get this memory from EAL now */
+	for (i = 0; i < n_pages; i++) {
+		const struct rte_memseg *ms;
+		void *cur = RTE_PTR_ADD(addr, pgsz * i);
+
+		ms = rte_mem_virt2memseg(cur, NULL);
+		if (ms == NULL) {
+			printf("%s():%i: Failed to retrieve memseg for external mem\n",
+				__func__, __LINE__);
+			goto fail;
+		}
+		if (ms->addr != cur) {
+			printf("%s():%i: VA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+		if (ms->iova != iova[i]) {
+			printf("%s():%i: IOVA mismatch\n", __func__, __LINE__);
+			goto fail;
+		}
+	}
+
+	/* allocate - this now should succeed */
+	ptr = rte_malloc_socket("EXTMEM", 64, 0, socket_id);
+	if (ptr == NULL) {
+		printf("%s():%i: Failed to allocate from external heap\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* check if address is in expected range */
+	if (ptr < addr || ptr >= RTE_PTR_ADD(addr, len)) {
+		printf("%s():%i: Allocated from unexpected address space\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* we've allocated something - removing memory should fail */
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) >= 0 ||
+			rte_errno != EBUSY) {
+		printf("%s():%i: Removing memory succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) >= 0 || rte_errno != EBUSY) {
+		printf("%s():%i: Destroying heap succeeded when memory is not free\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	/* try allocating an IOVA-contiguous memzone - this should succeed
+	 * because we've set up a contiguous IOVA table.
+	 */
+	mz = rte_memzone_reserve("heap_test", pgsz * 2, socket_id,
+			RTE_MEMZONE_IOVA_CONTIG);
+	if (mz == NULL) {
+		printf("%s():%i: Failed to reserve memzone\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	rte_malloc_dump_stats(stdout, NULL);
+	rte_malloc_dump_heaps(stdout);
+
+	/* free memory - removing it should now succeed */
+	rte_free(ptr);
+	ptr = NULL;
+
+	rte_memzone_free(mz);
+	mz = NULL;
+
+	if (rte_malloc_heap_memory_remove(heap_name, addr, len) != 0) {
+		printf("%s():%i: Removing memory from heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+	if (rte_malloc_heap_destroy(heap_name) != 0) {
+		printf("%s():%i: Destroying heap failed\n",
+			__func__, __LINE__);
+		goto fail;
+	}
+
+	return 0;
+fail:
+	rte_memzone_free(mz);
+	rte_free(ptr);
+	/* even if something failed, attempt to clean up */
+	rte_malloc_heap_memory_remove(heap_name, addr, len);
+	rte_malloc_heap_destroy(heap_name);
+
+	return -1;
+}
+
+/* we need to test attach/detach in secondary processes. */
+static int
+test_external_mem(void)
+{
+	size_t len = EXTERNAL_MEM_SZ;
+	size_t pgsz = RTE_PGSIZE_4K;
+	rte_iova_t iova[len / pgsz];
+	void *addr;
+	int ret, n_pages;
+
+	/* create external memory area */
+	n_pages = RTE_DIM(iova);
+	addr = mmap(NULL, len, PROT_WRITE | PROT_READ,
+			MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+	if (addr == MAP_FAILED) {
+		printf("%s():%i: Failed to create dummy memory area\n",
+			__func__, __LINE__);
+		return -1;
+	}
+	for (int i = 0; i < n_pages; i++) {
+		/* arbitrary IOVA */
+		rte_iova_t tmp = 0x100000000 + i * pgsz;
+		iova[i] = tmp;
+	}
+
+	ret = test_invalid_param(addr, len, pgsz, iova, n_pages);
+	ret |= test_basic(addr, len, pgsz, iova, n_pages);
+
+	munmap(addr, len);
+
+	return ret;
+}
+
+REGISTER_TEST_COMMAND(external_mem_autotest, test_external_mem);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (18 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 18/21] test: add unit tests for external memory support Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 14:05                 ` Iremonger, Bernard
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 20/21] doc: add external memory feature to the release notes Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 1 reply; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Bernard Iremonger, John McNamara,
	Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, bruce.richardson, thomas, shreyansh.jain, shahafs,
	arybchenko, alejandro.lucero

Currently, mempools can only be allocated either using native
DPDK memory, or anonymous memory. This patch will add two new
methods to allocate mempool using external memory (regular or
hugepage memory), and add documentation about it to testpmd
user guide.

It adds a new flag "--mp-alloc", with four possible values:
native (use regular DPDK allocator), anon (use anonymous
mempool), xmem (use externally allocated memory area), and
xmemhuge (use externally allocated hugepage memory area). Old
flag "--mp-anon" is kept for compatibility.

All external memory is allocated using the same external heap,
but each will allocate and add a new memory area.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/config.c                 |  21 +-
 app/test-pmd/parameters.c             |  23 +-
 app/test-pmd/testpmd.c                | 325 ++++++++++++++++++++++++--
 app/test-pmd/testpmd.h                |  13 +-
 doc/guides/testpmd_app_ug/run_app.rst |  12 +
 5 files changed, 369 insertions(+), 25 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 794aa5268..3b921cfc6 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -2423,6 +2423,23 @@ fwd_config_setup(void)
 		simple_fwd_config_setup();
 }
 
+static const char *
+mp_alloc_to_str(uint8_t mode)
+{
+	switch (mode) {
+	case MP_ALLOC_NATIVE:
+		return "native";
+	case MP_ALLOC_ANON:
+		return "anon";
+	case MP_ALLOC_XMEM:
+		return "xmem";
+	case MP_ALLOC_XMEM_HUGE:
+		return "xmemhuge";
+	default:
+		return "invalid";
+	}
+}
+
 void
 pkt_fwd_config_display(struct fwd_config *cfg)
 {
@@ -2431,12 +2448,12 @@ pkt_fwd_config_display(struct fwd_config *cfg)
 	streamid_t sm_id;
 
 	printf("%s packet forwarding%s - ports=%d - cores=%d - streams=%d - "
-		"NUMA support %s, MP over anonymous pages %s\n",
+		"NUMA support %s, MP allocation mode: %s\n",
 		cfg->fwd_eng->fwd_mode_name,
 		retry_enabled == 0 ? "" : " with retry",
 		cfg->nb_fwd_ports, cfg->nb_fwd_lcores, cfg->nb_fwd_streams,
 		numa_support == 1 ? "enabled" : "disabled",
-		mp_anon != 0 ? "enabled" : "disabled");
+		mp_alloc_to_str(mp_alloc_type));
 
 	if (retry_enabled)
 		printf("TX retry num: %u, delay between TX retries: %uus\n",
diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 9220e1c1b..565bea730 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -190,6 +190,11 @@ usage(char* progname)
 	printf("  --vxlan-gpe-port=N: UPD port of tunnel VXLAN-GPE\n");
 	printf("  --mlockall: lock all memory\n");
 	printf("  --no-mlockall: do not lock all memory\n");
+	printf("  --mp-alloc <native|anon|xmem|xmemhuge>: mempool allocation method.\n"
+	       "    native: use regular DPDK memory to create and populate mempool\n"
+	       "    anon: use regular DPDK memory to create and anonymous memory to populate mempool\n"
+	       "    xmem: use anonymous memory to create and populate mempool\n"
+	       "    xmemhuge: use anonymous hugepage memory to create and populate mempool\n");
 }
 
 #ifdef RTE_LIBRTE_CMDLINE
@@ -625,6 +630,7 @@ launch_args_parse(int argc, char** argv)
 		{ "vxlan-gpe-port",		1, 0, 0 },
 		{ "mlockall",			0, 0, 0 },
 		{ "no-mlockall",		0, 0, 0 },
+		{ "mp-alloc",			1, 0, 0 },
 		{ 0, 0, 0, 0 },
 	};
 
@@ -743,7 +749,22 @@ launch_args_parse(int argc, char** argv)
 			if (!strcmp(lgopts[opt_idx].name, "numa"))
 				numa_support = 1;
 			if (!strcmp(lgopts[opt_idx].name, "mp-anon")) {
-				mp_anon = 1;
+				mp_alloc_type = MP_ALLOC_ANON;
+			}
+			if (!strcmp(lgopts[opt_idx].name, "mp-alloc")) {
+				if (!strcmp(optarg, "native"))
+					mp_alloc_type = MP_ALLOC_NATIVE;
+				else if (!strcmp(optarg, "anon"))
+					mp_alloc_type = MP_ALLOC_ANON;
+				else if (!strcmp(optarg, "xmem"))
+					mp_alloc_type = MP_ALLOC_XMEM;
+				else if (!strcmp(optarg, "xmemhuge"))
+					mp_alloc_type = MP_ALLOC_XMEM_HUGE;
+				else
+					rte_exit(EXIT_FAILURE,
+						"mp-alloc %s invalid - must be: "
+						"native, anon, xmem or xmemhuge\n",
+						 optarg);
 			}
 			if (!strcmp(lgopts[opt_idx].name, "port-numa-config")) {
 				if (parse_portnuma_config(optarg))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 001f0e552..d9e0a5ddb 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -27,6 +27,7 @@
 #include <rte_log.h>
 #include <rte_debug.h>
 #include <rte_cycles.h>
+#include <rte_malloc_heap.h>
 #include <rte_memory.h>
 #include <rte_memcpy.h>
 #include <rte_launch.h>
@@ -63,6 +64,22 @@
 
 #include "testpmd.h"
 
+#ifndef MAP_HUGETLB
+/* FreeBSD may not have MAP_HUGETLB (in fact, it probably doesn't) */
+#define HUGE_FLAG (0x40000)
+#else
+#define HUGE_FLAG MAP_HUGETLB
+#endif
+
+#ifndef MAP_HUGE_SHIFT
+/* older kernels (or FreeBSD) will not have this define */
+#define HUGE_SHIFT (26)
+#else
+#define HUGE_SHIFT MAP_HUGE_SHIFT
+#endif
+
+#define EXTMEM_HEAP_NAME "extmem"
+
 uint16_t verbose_level = 0; /**< Silent by default. */
 int testpmd_logtype; /**< Log type for testpmd logs */
 
@@ -88,9 +105,13 @@ uint8_t numa_support = 1; /**< numa enabled by default */
 uint8_t socket_num = UMA_NO_CONFIG;
 
 /*
- * Use ANONYMOUS mapped memory (might be not physically continuous) for mbufs.
+ * Select mempool allocation type:
+ * - native: use regular DPDK memory
+ * - anon: use regular DPDK memory to create mempool, but populate using
+ *         anonymous memory (may not be IOVA-contiguous)
+ * - xmem: use externally allocated hugepage memory
  */
-uint8_t mp_anon = 0;
+uint8_t mp_alloc_type = MP_ALLOC_NATIVE;
 
 /*
  * Store specified sockets on which memory pool to be used by ports
@@ -527,6 +548,236 @@ set_def_fwd_config(void)
 	set_default_fwd_ports_config();
 }
 
+/* extremely pessimistic estimation of memory required to create a mempool */
+static int
+calc_mem_size(uint32_t nb_mbufs, uint32_t mbuf_sz, size_t pgsz, size_t *out)
+{
+	unsigned int n_pages, mbuf_per_pg, leftover;
+	uint64_t total_mem, mbuf_mem, obj_sz;
+
+	/* there is no good way to predict how much space the mempool will
+	 * occupy because it will allocate chunks on the fly, and some of those
+	 * will come from default DPDK memory while some will come from our
+	 * external memory, so just assume 128MB will be enough for everyone.
+	 */
+	uint64_t hdr_mem = 128 << 20;
+
+	/* account for possible non-contiguousness */
+	obj_sz = rte_mempool_calc_obj_size(mbuf_sz, 0, NULL);
+	if (obj_sz > pgsz) {
+		TESTPMD_LOG(ERR, "Object size is bigger than page size\n");
+		return -1;
+	}
+
+	mbuf_per_pg = pgsz / obj_sz;
+	leftover = (nb_mbufs % mbuf_per_pg) > 0;
+	n_pages = (nb_mbufs / mbuf_per_pg) + leftover;
+
+	mbuf_mem = n_pages * pgsz;
+
+	total_mem = RTE_ALIGN(hdr_mem + mbuf_mem, pgsz);
+
+	if (total_mem > SIZE_MAX) {
+		TESTPMD_LOG(ERR, "Memory size too big\n");
+		return -1;
+	}
+	*out = (size_t)total_mem;
+
+	return 0;
+}
+
+static inline uint32_t
+bsf64(uint64_t v)
+{
+	return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+	if (v == 0)
+		return 0;
+	v = rte_align64pow2(v);
+	return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+	/* as per mmap() manpage, all page sizes are log2 of page size
+	 * shifted by MAP_HUGE_SHIFT
+	 */
+	int log2 = log2_u64(page_sz);
+
+	return (log2 << HUGE_SHIFT);
+}
+
+static void *
+alloc_mem(size_t memsz, size_t pgsz, bool huge)
+{
+	void *addr;
+	int flags;
+
+	/* allocate anonymous hugepages */
+	flags = MAP_ANONYMOUS | MAP_PRIVATE;
+	if (huge)
+		flags |= HUGE_FLAG | pagesz_flags(pgsz);
+
+	addr = mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	return addr;
+}
+
+struct extmem_param {
+	void *addr;
+	size_t len;
+	size_t pgsz;
+	rte_iova_t *iova_table;
+	unsigned int iova_table_len;
+};
+
+static int
+create_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, struct extmem_param *param,
+		bool huge)
+{
+	uint64_t pgsizes[] = {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */
+			RTE_PGSIZE_16M, RTE_PGSIZE_16G};    /* POWER */
+	unsigned int cur_page, n_pages, pgsz_idx;
+	size_t mem_sz, cur_pgsz;
+	rte_iova_t *iovas = NULL;
+	void *addr;
+	int ret;
+
+	for (pgsz_idx = 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) {
+		/* skip anything that is too big */
+		if (pgsizes[pgsz_idx] > SIZE_MAX)
+			continue;
+
+		cur_pgsz = pgsizes[pgsz_idx];
+
+		/* if we were told not to allocate hugepages, override */
+		if (!huge)
+			cur_pgsz = sysconf(_SC_PAGESIZE);
+
+		ret = calc_mem_size(nb_mbufs, mbuf_sz, cur_pgsz, &mem_sz);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot calculate memory size\n");
+			return -1;
+		}
+
+		/* allocate our memory */
+		addr = alloc_mem(mem_sz, cur_pgsz, huge);
+
+		/* if we couldn't allocate memory with a specified page size,
+		 * that doesn't mean we can't do it with other page sizes, so
+		 * try another one.
+		 */
+		if (addr == NULL)
+			continue;
+
+		/* store IOVA addresses for every page in this memory area */
+		n_pages = mem_sz / cur_pgsz;
+
+		iovas = malloc(sizeof(*iovas) * n_pages);
+
+		if (iovas == NULL) {
+			TESTPMD_LOG(ERR, "Cannot allocate memory for iova addresses\n");
+			goto fail;
+		}
+		/* lock memory if it's not huge pages */
+		if (!huge)
+			mlock(addr, mem_sz);
+
+		/* populate IOVA addresses */
+		for (cur_page = 0; cur_page < n_pages; cur_page++) {
+			rte_iova_t iova;
+			size_t offset;
+			void *cur;
+
+			offset = cur_pgsz * cur_page;
+			cur = RTE_PTR_ADD(addr, offset);
+
+			/* touch the page before getting its IOVA */
+			*(volatile char *)cur = 0;
+
+			iova = rte_mem_virt2iova(cur);
+
+			iovas[cur_page] = iova;
+		}
+
+		break;
+	}
+	/* if we couldn't allocate anything */
+	if (iovas == NULL)
+		return -1;
+
+	param->addr = addr;
+	param->len = mem_sz;
+	param->pgsz = cur_pgsz;
+	param->iova_table = iovas;
+	param->iova_table_len = n_pages;
+
+	return 0;
+fail:
+	if (iovas)
+		free(iovas);
+	if (addr)
+		munmap(addr, mem_sz);
+
+	return -1;
+}
+
+static int
+setup_extmem(uint32_t nb_mbufs, uint32_t mbuf_sz, bool huge)
+{
+	struct extmem_param param;
+	int socket_id, ret;
+
+	memset(&param, 0, sizeof(param));
+
+	/* check if our heap exists */
+	socket_id = rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+	if (socket_id < 0) {
+		/* create our heap */
+		ret = rte_malloc_heap_create(EXTMEM_HEAP_NAME);
+		if (ret < 0) {
+			TESTPMD_LOG(ERR, "Cannot create heap\n");
+			return -1;
+		}
+	}
+
+	ret = create_extmem(nb_mbufs, mbuf_sz, &param, huge);
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot create memory area\n");
+		return -1;
+	}
+
+	/* we now have a valid memory area, so add it to heap */
+	ret = rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME,
+			param.addr, param.len, param.iova_table,
+			param.iova_table_len, param.pgsz);
+
+	/* when using VFIO, memory is automatically mapped for DMA by EAL */
+
+	/* not needed any more */
+	free(param.iova_table);
+
+	if (ret < 0) {
+		TESTPMD_LOG(ERR, "Cannot add memory to heap\n");
+		munmap(param.addr, param.len);
+		return -1;
+	}
+
+	/* success */
+
+	TESTPMD_LOG(DEBUG, "Allocated %zuMB of external memory\n",
+			param.len >> 20);
+
+	return 0;
+}
+
 /*
  * Configuration initialisation done once at init time.
  */
@@ -545,27 +796,59 @@ mbuf_pool_create(uint16_t mbuf_seg_size, unsigned nb_mbuf,
 		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u\n",
 		pool_name, nb_mbuf, mbuf_seg_size, socket_id);
 
-	if (mp_anon != 0) {
-		rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
-			mb_size, (unsigned) mb_mempool_cache,
-			sizeof(struct rte_pktmbuf_pool_private),
-			socket_id, 0);
-		if (rte_mp == NULL)
-			goto err;
+	switch (mp_alloc_type) {
+	case MP_ALLOC_NATIVE:
+		{
+			/* wrapper to rte_mempool_create() */
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+				mb_mempool_cache, 0, mbuf_seg_size, socket_id);
+			break;
+		}
+	case MP_ALLOC_ANON:
+		{
+			rte_mp = rte_mempool_create_empty(pool_name, nb_mbuf,
+				mb_size, (unsigned int) mb_mempool_cache,
+				sizeof(struct rte_pktmbuf_pool_private),
+				socket_id, 0);
+			if (rte_mp == NULL)
+				goto err;
+
+			if (rte_mempool_populate_anon(rte_mp) == 0) {
+				rte_mempool_free(rte_mp);
+				rte_mp = NULL;
+				goto err;
+			}
+			rte_pktmbuf_pool_init(rte_mp, NULL);
+			rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
+			break;
+		}
+	case MP_ALLOC_XMEM:
+	case MP_ALLOC_XMEM_HUGE:
+		{
+			int heap_socket;
+			bool huge = mp_alloc_type == MP_ALLOC_XMEM_HUGE;
 
-		if (rte_mempool_populate_anon(rte_mp) == 0) {
-			rte_mempool_free(rte_mp);
-			rte_mp = NULL;
-			goto err;
+			if (setup_extmem(nb_mbuf, mbuf_seg_size, huge) < 0)
+				rte_exit(EXIT_FAILURE, "Could not create external memory\n");
+
+			heap_socket =
+				rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME);
+			if (heap_socket < 0)
+				rte_exit(EXIT_FAILURE, "Could not get external memory socket ID\n");
+
+			TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
+					rte_mbuf_best_mempool_ops());
+			rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
+					mb_mempool_cache, 0, mbuf_seg_size,
+					heap_socket);
+			break;
+		}
+	default:
+		{
+			rte_exit(EXIT_FAILURE, "Invalid mempool creation mode\n");
 		}
-		rte_pktmbuf_pool_init(rte_mp, NULL);
-		rte_mempool_obj_iter(rte_mp, rte_pktmbuf_init, NULL);
-	} else {
-		/* wrapper to rte_mempool_create() */
-		TESTPMD_LOG(INFO, "preferred mempool ops selected: %s\n",
-				rte_mbuf_best_mempool_ops());
-		rte_mp = rte_pktmbuf_pool_create(pool_name, nb_mbuf,
-			mb_mempool_cache, 0, mbuf_seg_size, socket_id);
 	}
 
 err:
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a1f661472..65e0cec90 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -69,6 +69,16 @@ enum {
 	PORT_TOPOLOGY_LOOP,
 };
 
+enum {
+	MP_ALLOC_NATIVE, /**< allocate and populate mempool natively */
+	MP_ALLOC_ANON,
+	/**< allocate mempool natively, but populate using anonymous memory */
+	MP_ALLOC_XMEM,
+	/**< allocate and populate mempool using anonymous memory */
+	MP_ALLOC_XMEM_HUGE
+	/**< allocate and populate mempool using anonymous hugepage memory */
+};
+
 #ifdef RTE_TEST_PMD_RECORD_BURST_STATS
 /**
  * The data structure associated with RX and TX packet burst statistics
@@ -304,7 +314,8 @@ extern uint8_t  numa_support; /**< set by "--numa" parameter */
 extern uint16_t port_topology; /**< set by "--port-topology" parameter */
 extern uint8_t no_flush_rx; /**<set by "--no-flush-rx" parameter */
 extern uint8_t flow_isolate_all; /**< set by "--flow-isolate-all */
-extern uint8_t  mp_anon; /**< set by "--mp-anon" parameter */
+extern uint8_t  mp_alloc_type;
+/**< set by "--mp-anon" or "--mp-alloc" parameter */
 extern uint8_t no_link_check; /**<set by "--disable-link-check" parameter */
 extern volatile int test_done; /* stop packet forwarding when set to 1. */
 extern uint8_t lsc_interrupt; /**< disabled by "--no-lsc-interrupt" parameter */
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f301c2b6f..67a8532a4 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -498,3 +498,15 @@ The commandline options are:
 *   ``--no-mlockall``
 
     Disable locking all memory.
+
+*   ``--mp-alloc <native|anon|xmem|xmemhuge>``
+
+    Select mempool allocation mode:
+
+    * native: create and populate mempool using native DPDK memory
+    * anon: create mempool using native DPDK memory, but populate using
+      anonymous memory
+    * xmem: create and populate mempool using externally and anonymously
+      allocated area
+    * xmemhuge: create and populate mempool using externally and anonymously
+      allocated hugepage area
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 20/21] doc: add external memory feature to the release notes
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (19 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Document the addition of external memory support to DPDK.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e7674adb9..8fe463d72 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added support for using externally allocated memory in DPDK.**
+
+  DPDK has gained support for creating new ``rte_malloc`` heaps referencing
+  memory that was created outside of DPDK's own page allocator, and using that
+  memory natively with any other DPDK library or data structure.
+
 * **Add support to offload more flow match and actions for CXGBE PMD**
 
   Flow API support has been enhanced for CXGBE Poll Mode Driver to offload:
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* [dpdk-dev] [PATCH v9 21/21] doc: add external memory feature to programmer's guide
  2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
                                 ` (20 preceding siblings ...)
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 20/21] doc: add external memory feature to the release notes Anatoly Burakov
@ 2018-10-02 13:34               ` Anatoly Burakov
  21 siblings, 0 replies; 225+ messages in thread
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add a short chapter on usage of external memory in DPDK to the
Programmer's Guide.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 37 +++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d362c9209..00ce64ceb 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -213,6 +213,43 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
+Support for Externally Allocated Memory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to use externally allocated memory in DPDK, using a set of malloc
+heap API's. Support for externally allocated memory is implemented through
+overloading the socket ID - externally allocated heaps will have socket ID's
+that would be considered invalid under normal circumstances. Requesting an
+allocation to take place from a specified externally allocated memory is a
+matter of supplying the correct socket ID to DPDK allocator, either directly
+(e.g. through a call to ``rte_malloc``) or indirectly (through data
+structure-specific allocation API's such as ``rte_ring_create``).
+
+Since there is no way DPDK can verify whether memory are is available or valid,
+this responsibility falls on the shoulders of the user. All multiprocess
+synchronization is also user's responsibility, as well as ensuring  that all
+calls to add/attach/detach/remove memory are done in the correct order. It is
+not required to attach to a memory area in all processes - only attach to memory
+areas as needed.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Create a named heap
+* Add memory area(s) to the heap
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable, and DMA mappings will not be performed
+    - Other processes must attach to the memory area before they can use it
+* Get socket ID used for the heap
+* Use normal DPDK allocation procedures, using supplied socket ID
+* If memory area is no longer needed, it can be removed from the heap
+    - Other processes must detach from this memory area before it can be removed
+* If heap is no longer needed, remove it
+    - Socket ID will become invalid and will not be reused
+
+For more information, please refer to ``rte_malloc`` API documentation,
+specifically the ``rte_malloc_heap_*`` family of function calls.
+
 PCI Access
 ~~~~~~~~~~
 
-- 
2.17.1

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory Anatoly Burakov
@ 2018-10-02 14:05                 ` Iremonger, Bernard
  0 siblings, 0 replies; 225+ messages in thread
From: Iremonger, Bernard @ 2018-10-02 14:05 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Lu, Wenzhuo, Wu, Jingjing, Mcnamara, John, Kovacevic, Marko,
	laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, Wiles, Keith, Richardson, Bruce,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, October 2, 2018 2:35 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>;
> Mcnamara, John <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; laszlo.madarassy@ericsson.com;
> laszlo.vadkerti@ericsson.com; andras.kovacs@ericsson.com;
> winnie.tian@ericsson.com; daniel.andrasi@ericsson.com;
> janos.kobor@ericsson.com; geza.koblo@ericsson.com;
> srinath.mannam@broadcom.com; scott.branden@broadcom.com;
> ajit.khaparde@broadcom.com; Wiles, Keith <keith.wiles@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; thomas@monjalon.net;
> shreyansh.jain@nxp.com; shahafs@mellanox.com;
> arybchenko@solarflare.com; alejandro.lucero@netronome.com
> Subject: [PATCH v9 19/21] app/testpmd: add support for external memory
> 
> Currently, mempools can only be allocated either using native DPDK memory, or
> anonymous memory. This patch will add two new methods to allocate mempool
> using external memory (regular or hugepage memory), and add documentation
> about it to testpmd user guide.
> 
> It adds a new flag "--mp-alloc", with four possible values:
> native (use regular DPDK allocator), anon (use anonymous mempool), xmem
> (use externally allocated memory area), and xmemhuge (use externally allocated
> hugepage memory area). Old flag "--mp-anon" is kept for compatibility.
> 
> All external memory is allocated using the same external heap, but each will
> allocate and add a new memory area.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

^ permalink raw reply	[flat|nested] 225+ messages in thread

* Re: [dpdk-dev] [PATCH v9 00/21] Support externally allocated memory in DPDK
  2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
@ 2018-10-11  9:15                 ` Thomas Monjalon
  0 siblings, 0 replies; 225+ messages in thread
From: Thomas Monjalon @ 2018-10-11  9:15 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	bruce.richardson, shreyansh.jain, shahafs, arybchenko,
	alejandro.lucero

> Anatoly Burakov (21):
>   mem: add length to memseg list
>   mem: allow memseg lists to be marked as external
>   malloc: index heaps using heap ID rather than NUMA node
>   mem: do not check for invalid socket ID
>   flow_classify: do not check for invalid socket ID
>   pipeline: do not check for invalid socket ID
>   sched: do not check for invalid socket ID
>   malloc: add name to malloc heaps
>   malloc: add function to query socket ID of named heap
>   malloc: add function to check if socket is external
>   malloc: allow creating malloc heaps
>   malloc: allow destroying heaps
>   malloc: allow adding memory to named heaps
>   malloc: allow removing memory from named heaps
>   malloc: allow attaching to external memory chunks
>   malloc: allow detaching from external memory
>   malloc: enable event callbacks for external memory
>   test: add unit tests for external memory support
>   app/testpmd: add support for external memory
>   doc: add external memory feature to the release notes
>   doc: add external memory feature to programmer's guide

last 2 patches merged together

Applied, thanks

^ permalink raw reply	[flat|nested] 225+ messages in thread

end of thread, other threads:[~2018-10-11  9:15 UTC | newest]

Thread overview: 225+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04 13:11 [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 01/16] mem: add length to memseg list Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 02/16] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 03/16] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 04/16] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 05/16] flow_classify: " Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 06/16] pipeline: " Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 07/16] sched: " Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 08/16] malloc: add name to malloc heaps Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 09/16] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 10/16] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 11/16] malloc: allow destroying heaps Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 12/16] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 13/16] malloc: allow removing memory from " Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 14/16] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 15/16] malloc: allow detaching from external memory Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 16/16] test: add unit tests for external memory support Anatoly Burakov
2018-09-13  7:44 ` [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK Shahaf Shuler
2018-09-17 10:07   ` Burakov, Anatoly
2018-09-17 12:16     ` Shahaf Shuler
2018-09-17 13:00       ` Burakov, Anatoly
2018-09-18 12:29         ` Shreyansh Jain
2018-09-18 15:15           ` Burakov, Anatoly
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 00/20] " Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 " Anatoly Burakov
2018-09-23 21:21       ` Thomas Monjalon
2018-09-24  8:54         ` Burakov, Anatoly
2018-09-26 11:21       ` [dpdk-dev] [PATCH v5 00/21] " Anatoly Burakov
2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
2018-10-11  9:15                 ` Thomas Monjalon
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 05/21] flow_classify: " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 06/21] pipeline: " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 07/21] sched: " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 10/21] malloc: add function to check if socket is external Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 12/21] malloc: allow destroying heaps Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 14/21] malloc: allow removing memory from " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 16/21] malloc: allow detaching from external memory Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 17/21] malloc: enable event callbacks for " Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 18/21] test: add unit tests for external memory support Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 19/21] app/testpmd: add support for external memory Anatoly Burakov
2018-10-02 14:05                 ` Iremonger, Bernard
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 20/21] doc: add external memory feature to the release notes Anatoly Burakov
2018-10-02 13:34               ` [dpdk-dev] [PATCH v9 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-01 17:01               ` Stephen Hemminger
2018-10-02  9:03                 ` Burakov, Anatoly
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 05/21] flow_classify: " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 06/21] pipeline: " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 07/21] sched: " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 10/21] malloc: add function to check if socket is external Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 12/21] malloc: allow destroying heaps Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 14/21] malloc: allow removing memory from " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 16/21] malloc: allow detaching from external memory Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 17/21] malloc: enable event callbacks for " Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 18/21] test: add unit tests for external memory support Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 19/21] app/testpmd: add support for external memory Anatoly Burakov
2018-10-01 15:11               ` Iremonger, Bernard
2018-10-01 15:23                 ` Burakov, Anatoly
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 20/21] doc: add external memory feature to the release notes Anatoly Burakov
2018-10-01 12:56             ` [dpdk-dev] [PATCH v8 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 05/21] flow_classify: " Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 06/21] pipeline: " Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 07/21] sched: " Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-10-01 11:04           ` [dpdk-dev] [PATCH v7 10/21] malloc: add function to check if socket is external Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 12/21] malloc: allow destroying heaps Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 14/21] malloc: allow removing memory from " Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 16/21] malloc: allow detaching from external memory Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 17/21] malloc: enable event callbacks for " Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 18/21] test: add unit tests for external memory support Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 19/21] app/testpmd: add support for external memory Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 20/21] doc: add external memory feature to the release notes Anatoly Burakov
2018-10-01 11:05           ` [dpdk-dev] [PATCH v7 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 01/21] mem: add length to memseg list Anatoly Burakov
2018-09-27 11:05           ` Shreyansh Jain
2018-09-27 10:40         ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03           ` Shreyansh Jain
2018-09-27 11:08             ` Burakov, Anatoly
2018-09-27 11:12               ` Shreyansh Jain
2018-09-27 11:29                 ` Burakov, Anatoly
2018-09-29  0:09           ` Yongseok Koh
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-27 13:01           ` Alejandro Lucero
2018-09-27 13:18             ` Burakov, Anatoly
2018-09-27 13:21               ` Alejandro Lucero
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-27 13:14           ` Alejandro Lucero
2018-09-27 13:21             ` Burakov, Anatoly
2018-09-27 13:42               ` Alejandro Lucero
2018-09-27 14:04                 ` Burakov, Anatoly
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 05/21] flow_classify: " Anatoly Burakov
2018-09-27 16:14           ` Iremonger, Bernard
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 06/21] pipeline: " Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 07/21] sched: " Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 10/21] malloc: add function to check if socket is external Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 12/21] malloc: allow destroying heaps Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 14/21] malloc: allow removing memory from " Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 16/21] malloc: allow detaching from external memory Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 17/21] malloc: enable event callbacks for " Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 18/21] test: add unit tests for external memory support Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 19/21] app/testpmd: add support for external memory Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 20/21] doc: add external memory feature to the release notes Anatoly Burakov
2018-09-27 10:41         ` [dpdk-dev] [PATCH v6 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 01/21] mem: add length to memseg list Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 05/21] flow_classify: " Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 06/21] pipeline: " Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 07/21] sched: " Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 09/21] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 10/21] malloc: add function to check if socket is external Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 12/21] malloc: allow destroying heaps Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 13/21] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 14/21] malloc: allow removing memory from " Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 15/21] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 16/21] malloc: allow detaching from external memory Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 17/21] malloc: enable event callbacks for " Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 18/21] test: add unit tests for external memory support Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 19/21] app/testpmd: add support for external memory Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 20/21] doc: add external memory feature to the release notes Anatoly Burakov
2018-09-26 11:22       ` [dpdk-dev] [PATCH v5 21/21] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-26 15:19         ` Kovacevic, Marko
2018-09-26 16:00           ` Burakov, Anatoly
2018-09-26 16:17             ` Kovacevic, Marko
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 01/20] mem: add length to memseg list Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 05/20] flow_classify: " Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 06/20] pipeline: " Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 07/20] sched: " Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 08/20] malloc: add name to malloc heaps Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-21 16:13     ` [dpdk-dev] [PATCH v4 10/20] malloc: add function to check if socket is external Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 11/20] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 12/20] malloc: allow destroying heaps Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 13/20] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 14/20] malloc: allow removing memory from " Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 15/20] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 16/20] malloc: allow detaching from external memory Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 17/20] test: add unit tests for external memory support Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 18/20] app/testpmd: add support for external memory Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 19/20] doc: add external memory feature to the release notes Anatoly Burakov
2018-09-21 16:14     ` [dpdk-dev] [PATCH v4 20/20] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 01/20] mem: add length to memseg list Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 05/20] flow_classify: " Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 06/20] pipeline: " Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 07/20] sched: " Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 08/20] malloc: add name to malloc heaps Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 10/20] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 11/20] malloc: allow destroying heaps Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 13/20] malloc: allow removing memory from " Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 15/20] malloc: allow detaching from external memory Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 16/20] test: add unit tests for external memory support Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app Anatoly Burakov
2018-09-20 22:47     ` Ananyev, Konstantin
2018-09-21  9:03       ` Burakov, Anatoly
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 18/20] doc: add external memory feature to the release notes Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-20 11:36   ` [dpdk-dev] [PATCH v3 20/20] doc: add external memory sample application guide Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 01/20] mem: add length to memseg list Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20  9:30   ` Andrew Rybchenko
2018-09-20  9:54     ` Burakov, Anatoly
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 03/20] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 05/20] flow_classify: " Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 06/20] pipeline: " Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 07/20] sched: " Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 08/20] malloc: add name to malloc heaps Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 09/20] malloc: add function to query socket ID of named heap Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 10/20] malloc: allow creating malloc heaps Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 11/20] malloc: allow destroying heaps Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 12/20] malloc: allow adding memory to named heaps Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 13/20] malloc: allow removing memory from " Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 14/20] malloc: allow attaching to external memory chunks Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 15/20] malloc: allow detaching from external memory Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 16/20] test: add unit tests for external memory support Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 17/20] examples: add external memory example app Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 18/20] doc: add external memory feature to the release notes Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 19/20] doc: add external memory feature to programmer's guide Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 20/20] doc: add external memory sample application guide Anatoly Burakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).