DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
@ 2018-11-29 13:48 Anatoly Burakov
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
                   ` (9 more replies)
  0 siblings, 10 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-11-29 13:48 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, the only way to use externally allocated memory
is through rte_malloc API's. While this is fine for a lot
of use cases, it may not be suitable for certain other use
cases like manual memory management, etc.

This patchset adds another API to register memory segments
with DPDK (so that API's like ``rte_mem_virt2memseg`` could
be relied on by PMD's and such), but not create a malloc
heap out of them.

Aside from the obvious (not adding memory to a heap), the
other major difference between this API and the
``rte_malloc_heap_*`` external memory functions is the fact
that no DMA mapping is performed automatically.

This really draws a line in the sand, and there are now two
ways of doing things - do everything automatically (using
the ``rte_malloc_heap_*`` API's), or do everything manually
(``rte_extmem_*`` and future DMA mapping API [1] that would
replace ``rte_vfio_dma_map``). This way, the consistency of
API is kept, and flexibility is also allowed.

[1] https://mails.dpdk.org/archives/dev/2018-November/118175.html

Note: at the time of this writing, there's no release notes
      template, so no release notes updates in this patchset.
      They will be added in later revisions.

Anatoly Burakov (4):
  malloc: separate creating memseg list and malloc heap
  malloc: separate destroying memseg list and heap data
  mem: allow registering external memory areas
  mem: allow usage of non-heap external memory in multiprocess

 .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
 lib/librte_eal/common/eal_common_memory.c     | 116 +++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 122 ++++++++++++++++++
 lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
 lib/librte_eal/common/malloc_heap.h           |  15 ++-
 lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
 lib/librte_eal/rte_eal_version.map            |   4 +
 7 files changed, 434 insertions(+), 105 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
@ 2018-11-29 13:48 ` Anatoly Burakov
  2018-12-14  9:33   ` Yongseok Koh
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Anatoly Burakov @ 2018-11-29 13:48 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/malloc_heap.c | 34 ++++++++++++++++++-----------
 lib/librte_eal/common/malloc_heap.h |  9 ++++++--
 lib/librte_eal/common/rte_malloc.c  | 11 ++++++++--
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c6a6d4f6b..25693481f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1095,9 +1095,10 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 	return 0;
 }
 
-int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	char fbarray_name[RTE_FBARRAY_NAME_LEN];
@@ -1117,17 +1118,17 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	if (msl == NULL) {
 		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
 		rte_errno = ENOSPC;
-		return -1;
+		return NULL;
 	}
 
 	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
-			heap->name, va_addr);
+			seg_name, va_addr);
 
 	/* create the backing fbarray */
 	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
 			sizeof(struct rte_memseg)) < 0) {
 		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
-		return -1;
+		return NULL;
 	}
 	arr = &msl->memseg_arr;
 
@@ -1143,32 +1144,39 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		ms->len = page_sz;
 		ms->nchannel = rte_memory_get_nchannel();
 		ms->nrank = rte_memory_get_nrank();
-		ms->socket_id = heap->socket_id;
+		ms->socket_id = socket_id;
 	}
 
 	/* set up the memseg list */
 	msl->base_va = va_addr;
 	msl->page_sz = page_sz;
-	msl->socket_id = heap->socket_id;
+	msl->socket_id = socket_id;
 	msl->len = seg_len;
 	msl->version = 0;
 	msl->external = 1;
 
+	return msl;
+}
+
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl)
+{
 	/* erase contents of new memory */
-	memset(va_addr, 0, seg_len);
+	memset(msl->base_va, 0, msl->len);
 
 	/* now, add newly minted memory to the malloc heap */
-	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+	malloc_heap_add_memory(heap, msl, msl->base_va, msl->len);
 
-	heap->total_size += seg_len;
+	heap->total_size += msl->len;
 
 	/* all done! */
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
-			heap->name, va_addr);
+			heap->name, msl->base_va);
 
 	/* notify all subscribers that a new memory area has been added */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
-			va_addr, seg_len);
+			msl->base_va, msl->len);
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index e48996d52..255a315b8 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,9 +39,14 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id);
+
 int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl);
 
 int
 malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0da5ad5e8..66bfe63c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -340,6 +340,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	unsigned int n;
 	int ret;
 
@@ -373,9 +374,15 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		goto unlock;
 	}
 
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n, page_sz,
+			heap_name, heap->socket_id);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
-	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
-			page_sz);
+	ret = malloc_heap_add_external_memory(heap, msl);
 	rte_spinlock_unlock(&heap->lock);
 
 unlock:
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
@ 2018-11-29 13:48 ` Anatoly Burakov
  2018-12-14  9:34   ` Yongseok Koh
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Anatoly Burakov @ 2018-11-29 13:48 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/malloc_heap.c |  70 +++++++++++++++----
 lib/librte_eal/common/malloc_heap.h |   6 ++
 lib/librte_eal/common/rte_malloc.c  | 104 ++++++++++------------------
 3 files changed, 102 insertions(+), 78 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 25693481f..fa0cb0799 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1067,12 +1067,9 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 }
 
 static int
-destroy_seg(struct malloc_elem *elem, size_t len)
+destroy_elem(struct malloc_elem *elem, size_t len)
 {
 	struct malloc_heap *heap = elem->heap;
-	struct rte_memseg_list *msl;
-
-	msl = elem->msl;
 
 	/* notify all subscribers that a memory area is going to be removed */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
@@ -1085,13 +1082,6 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	memset(elem, 0, sizeof(*elem));
 
-	/* destroy the fbarray backing this memory */
-	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
-		return -1;
-
-	/* reset the memseg list */
-	memset(msl, 0, sizeof(*msl));
-
 	return 0;
 }
 
@@ -1158,6 +1148,62 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	return msl;
 }
 
+struct extseg_walk_arg {
+	void *va_addr;
+	size_t len;
+	struct rte_memseg_list *msl;
+};
+
+static int
+extseg_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct extseg_walk_arg *wa = arg;
+
+	if (msl->base_va == wa->va_addr && msl->len == wa->len) {
+		unsigned int found_idx;
+
+		/* msl is const */
+		found_idx = msl - mcfg->memsegs;
+		wa->msl = &mcfg->memsegs[found_idx];
+		return 1;
+	}
+	return 0;
+}
+
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len)
+{
+	struct extseg_walk_arg wa;
+	int res;
+
+	wa.va_addr = va_addr;
+	wa.len = len;
+
+	res = rte_memseg_list_walk_thread_unsafe(extseg_walk, &wa);
+
+	if (res != 1) {
+		/* 0 means nothing was found, -1 shouldn't happen */
+		if (res == 0)
+			rte_errno = ENOENT;
+		return NULL;
+	}
+	return wa.msl;
+}
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl)
+{
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl)
@@ -1206,7 +1252,7 @@ malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_errno = EBUSY;
 		return -1;
 	}
-	return destroy_seg(elem, len);
+	return destroy_elem(elem, len);
 }
 
 int
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 255a315b8..ca9ff666f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -44,6 +44,12 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len);
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl);
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 66bfe63c3..9a82e3386 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -396,6 +396,7 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -420,9 +421,19 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 		goto unlock;
 	}
 
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
 	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
 	rte_spinlock_unlock(&heap->lock);
+	if (ret != 0)
+		goto unlock;
+
+	ret = malloc_heap_destroy_external_seg(msl);
 
 unlock:
 	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
@@ -430,63 +441,12 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
-struct sync_mem_walk_arg {
-	void *va_addr;
-	size_t len;
-	int result;
-	bool attach;
-};
-
-static int
-sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
-{
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct sync_mem_walk_arg *wa = arg;
-	size_t len = msl->page_sz * msl->memseg_arr.len;
-
-	if (msl->base_va == wa->va_addr &&
-			len == wa->len) {
-		struct rte_memseg_list *found_msl;
-		int msl_idx, ret;
-
-		/* msl is const */
-		msl_idx = msl - mcfg->memsegs;
-		found_msl = &mcfg->memsegs[msl_idx];
-
-		if (wa->attach) {
-			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		} else {
-			/* notify all subscribers that a memory area is about to
-			 * be removed
-			 */
-			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
-					msl->base_va, msl->len);
-			ret = rte_fbarray_detach(&found_msl->memseg_arr);
-		}
-
-		if (ret < 0) {
-			wa->result = -rte_errno;
-		} else {
-			/* notify all subscribers that a new memory area was
-			 * added
-			 */
-			if (wa->attach)
-				eal_memalloc_mem_event_notify(
-						RTE_MEM_EVENT_ALLOC,
-						msl->base_va, msl->len);
-			wa->result = 0;
-		}
-		return 1;
-	}
-	return 0;
-}
-
 static int
 sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
-	struct sync_mem_walk_arg wa;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -513,23 +473,35 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 	}
 
 	/* find corresponding memseg list to sync to */
-	wa.va_addr = va_addr;
-	wa.len = len;
-	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
-	wa.attach = attach;
-
-	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
-
-	if (wa.result < 0) {
-		rte_errno = -wa.result;
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
 		ret = -1;
-	} else {
-		/* notify all subscribers that a new memory area was added */
-		if (attach)
+		goto unlock;
+	}
+
+	if (attach) {
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+		if (ret == 0) {
+			/* notify all subscribers that a new memory area was
+			 * added.
+			 */
 			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
 					va_addr, len);
-		ret = 0;
+		} else {
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		/* notify all subscribers that a memory area is about to
+		 * be removed.
+		 */
+		eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+				msl->base_va, msl->len);
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+		if (ret < 0) {
+			ret = -1;
+			goto unlock;
+		}
 	}
 unlock:
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
@ 2018-11-29 13:48 ` Anatoly Burakov
  2018-12-14  9:55   ` Yongseok Koh
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Anatoly Burakov @ 2018-11-29 13:48 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
 lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 4 files changed, 189 insertions(+), 10 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 8b5d050c7..d7799b626 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-It is possible to use externally allocated memory in DPDK, using a set of malloc
-heap API's. Support for externally allocated memory is implemented through
-overloading the socket ID - externally allocated heaps will have socket ID's
-that would be considered invalid under normal circumstances. Requesting an
-allocation to take place from a specified externally allocated memory is a
-matter of supplying the correct socket ID to DPDK allocator, either directly
-(e.g. through a call to ``rte_malloc``) or indirectly (through data
-structure-specific allocation API's such as ``rte_ring_create``).
+It is possible to use externally allocated memory in DPDK. There are two ways in
+which using externally allocated memory can work: the malloc heap API's, and
+manual memory management.
 
-Since there is no way DPDK can verify whether memory are is available or valid,
-this responsibility falls on the shoulders of the user. All multiprocess
++ Using heap API's for externally allocated memory
+
+Using using a set of malloc heap API's is the recommended way to use externally
+allocated memory in DPDK. In this way, support for externally allocated memory
+is implemented through overloading the socket ID - externally allocated heaps
+will have socket ID's that would be considered invalid under normal
+circumstances. Requesting an allocation to take place from a specified
+externally allocated memory is a matter of supplying the correct socket ID to
+DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
+indirectly (through data structure-specific allocation API's such as
+``rte_ring_create``). Using these API's also ensures that mapping of externally
+allocated memory for DMA is also performed on any memory segment that is added
+to a DPDK malloc heap.
+
+Since there is no way DPDK can verify whether memory is available or valid, this
+responsibility falls on the shoulders of the user. All multiprocess
 synchronization is also user's responsibility, as well as ensuring  that all
 calls to add/attach/detach/remove memory are done in the correct order. It is
 not required to attach to a memory area in all processes - only attach to memory
@@ -246,6 +255,37 @@ The expected workflow is as follows:
 For more information, please refer to ``rte_malloc`` API documentation,
 specifically the ``rte_malloc_heap_*`` family of function calls.
 
++ Using externally allocated memory without DPDK API's
+
+While using heap API's is the recommended method of using externally allocated
+memory in DPDK, there are certain use cases where the overhead of DPDK heap API
+is undesirable - for example, when manual memory management is performed on an
+externally allocated area. To support use cases where externally allocated
+memory will not be used as part of normal DPDK workflow, there is also another
+set of API's under the ``rte_extmem_*`` namespace.
+
+These API's are (as their name implies) intended to allow registering or
+unregistering externally allocated memory to/from DPDK's internal page table, to
+allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
+memory. Memory added this way will not be available for any regular DPDK
+allocators; DPDK will leave this memory for the user application to manage.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Register memory within DPDK
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable
+* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Use the memory area in your application
+* If memory area is no longer needed, it can be unregistered
+    - If the area was mapped for DMA, unmapping must be performed before
+      unregistering memory
+
+Since these externally allocated memory areas will not be managed by DPDK, it is
+therefore up to the user application to decide how to use them and what to do
+with them once they're registered.
+
 Per-lcore and Shared Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..a2e085ae8 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -24,6 +24,7 @@
 #include "eal_memalloc.h"
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
+#include "malloc_heap.h"
 
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
@@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int socket_id;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg */
+	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, "extmem", socket_id) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+
+	ret = malloc_heap_destroy_external_seg(msl);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index d970825df..4a43c1a9e 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -423,6 +423,69 @@ int __rte_experimental
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register
+ * @param len
+ *   Length of virtual area to register
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 3fe78260d..593691a14 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_register;
+	rte_extmem_unregister;
 	rte_fbarray_attach;
 	rte_fbarray_destroy;
 	rte_fbarray_detach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
@ 2018-11-29 13:48 ` Anatoly Burakov
  2018-12-14  9:56   ` Yongseok Koh
  2018-12-02  5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Anatoly Burakov @ 2018-11-29 13:48 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../prog_guide/env_abstraction_layer.rst      |  3 +
 lib/librte_eal/common/eal_common_memory.c     | 42 +++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 59 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 4 files changed, 106 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d7799b626..b0491bf2d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -276,11 +276,14 @@ The expected workflow is as follows:
 * Register memory within DPDK
     - If IOVA table is not specified, IOVA addresses will be assumed to be
       unavailable
+    - Other processes must attach to the memory area before they can use it
 * Perform DMA mapping with ``rte_vfio_dma_map`` if needed
 * Use the memory area in your application
 * If memory area is no longer needed, it can be unregistered
     - If the area was mapped for DMA, unmapping must be performed before
       unregistering memory
+    - Other processes must detach from the memory area before it can be
+      unregistered
 
 Since these externally allocated memory areas will not be managed by DPDK, it is
 therefore up to the user application to decide how to use them and what to do
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index a2e085ae8..67b445c31 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -849,6 +849,48 @@ rte_extmem_unregister(void *va_addr, size_t len)
 	return ret;
 }
 
+static int
+sync_memory(void *va_addr, size_t len, bool attach)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (attach)
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+	else
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, true);
+}
+
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, false);
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 4a43c1a9e..050bb6d8e 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -435,6 +435,10 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
  * @note This API will not perform any DMA mapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
  * @param va_addr
  *   Start of virtual area to register
  * @param len
@@ -472,6 +476,9 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
  * @note This API will not perform any DMA unmapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before calling this function, all other processes must call
+ *   ``rte_extmem_detach`` to detach from the memory area.
+ *
  * @param va_addr
  *   Start of virtual area to unregister
  * @param len
@@ -486,6 +493,58 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 int __rte_experimental
 rte_extmem_unregister(void *va_addr, size_t len);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Attach to external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register
+ * @param len
+ *   Length of virtual area to register
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Detach from external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 593691a14..eb5f7b9cb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_attach;
+	rte_extmem_detach;
 	rte_extmem_register;
 	rte_extmem_unregister;
 	rte_fbarray_attach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
@ 2018-12-02  5:48 ` Shahaf Shuler
  2018-12-02 23:28   ` Yongseok Koh
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 29+ messages in thread
From: Shahaf Shuler @ 2018-12-02  5:48 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: Yongseok Koh, Thomas Monjalon, shreyansh.jain

Hi Anatoly, 

Thursday, November 29, 2018 3:49 PM, Anatoly Burakov:
> Subject: [PATCH 0/4] Allow using external memory without malloc
> 
> Currently, the only way to use externally allocated memory is through
> rte_malloc API's. While this is fine for a lot of use cases, it may not be suitable
> for certain other use cases like manual memory management, etc.
> 
> This patchset adds another API to register memory segments with DPDK (so
> that API's like ``rte_mem_virt2memseg`` could be relied on by PMD's and
> such), but not create a malloc heap out of them.
> 
> Aside from the obvious (not adding memory to a heap), the other major
> difference between this API and the ``rte_malloc_heap_*`` external memory
> functions is the fact that no DMA mapping is performed automatically.
> 
> This really draws a line in the sand, and there are now two ways of doing
> things - do everything automatically (using the ``rte_malloc_heap_*`` API's),
> or do everything manually (``rte_extmem_*`` and future DMA mapping API
> [1] that would replace ``rte_vfio_dma_map``). This way, the consistency of
> API is kept, and flexibility is also allowed.
> 

As you know I like the idea.
One question though, do you see a use case for application to have externally allocated memory which needs to be registered to the DPDK subsystem however not being used for DMA?
My only guess would be so helper libraries which requires the memory allocation from user (however it doesn't seems like a good API). 

If no use case, maybe it is better to merge between the two (rte_extmem_* and rte_dma_map) to have a single call for app to register and DMA map the memory. The rte_mem_virt2memseg is not something application needs to understand, it is used internally by PMDs or other libs. 

> [1]
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> ils.dpdk.org%2Farchives%2Fdev%2F2018-
> November%2F118175.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
> m%7C007a9234feaf42c82f6508d656015eb1%7Ca652971c7d2e4d9ba6a4d1492
> 56f461b%7C0%7C0%7C636790961244424277&amp;sdata=YqwcPEEhJM3I7Toe
> Ne%2BGcbeo%2FmPbYEnNFckoA7ES2Hg%3D&amp;reserved=0
> 
> Note: at the time of this writing, there's no release notes
>       template, so no release notes updates in this patchset.
>       They will be added in later revisions.
> 
> Anatoly Burakov (4):
>   malloc: separate creating memseg list and malloc heap
>   malloc: separate destroying memseg list and heap data
>   mem: allow registering external memory areas
>   mem: allow usage of non-heap external memory in multiprocess
> 
>  .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
>  lib/librte_eal/common/eal_common_memory.c     | 116
> +++++++++++++++++
>  lib/librte_eal/common/include/rte_memory.h    | 122
> ++++++++++++++++++
>  lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
>  lib/librte_eal/common/malloc_heap.h           |  15 ++-
>  lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
>  lib/librte_eal/rte_eal_version.map            |   4 +
>  7 files changed, 434 insertions(+), 105 deletions(-)
> 
> --
> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
  2018-12-02  5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
@ 2018-12-02 23:28   ` Yongseok Koh
  2018-12-03 10:23     ` Burakov, Anatoly
  0 siblings, 1 reply; 29+ messages in thread
From: Yongseok Koh @ 2018-12-02 23:28 UTC (permalink / raw)
  To: Shahaf Shuler; +Cc: Anatoly Burakov, dev, Thomas Monjalon, shreyansh.jain


> On Dec 1, 2018, at 9:48 PM, Shahaf Shuler <shahafs@mellanox.com> wrote:
> 
> Hi Anatoly, 
> 
> Thursday, November 29, 2018 3:49 PM, Anatoly Burakov:
>> Subject: [PATCH 0/4] Allow using external memory without malloc
>> 
>> Currently, the only way to use externally allocated memory is through
>> rte_malloc API's. While this is fine for a lot of use cases, it may not be suitable
>> for certain other use cases like manual memory management, etc.
>> 
>> This patchset adds another API to register memory segments with DPDK (so
>> that API's like ``rte_mem_virt2memseg`` could be relied on by PMD's and
>> such), but not create a malloc heap out of them.
>> 
>> Aside from the obvious (not adding memory to a heap), the other major
>> difference between this API and the ``rte_malloc_heap_*`` external memory
>> functions is the fact that no DMA mapping is performed automatically.
>> 
>> This really draws a line in the sand, and there are now two ways of doing
>> things - do everything automatically (using the ``rte_malloc_heap_*`` API's),
>> or do everything manually (``rte_extmem_*`` and future DMA mapping API
>> [1] that would replace ``rte_vfio_dma_map``). This way, the consistency of
>> API is kept, and flexibility is also allowed.
>> 
> 
> As you know I like the idea.
> One question though, do you see a use case for application to have externally allocated memory which needs to be registered to the DPDK subsystem however not being used for DMA?
> My only guess would be so helper libraries which requires the memory allocation from user (however it doesn't seems like a good API). 
> 
> If no use case, maybe it is better to merge between the two (rte_extmem_* and rte_dma_map) to have a single call for app to register and DMA map the memory. The rte_mem_virt2memseg is not something application needs to understand, it is used internally by PMDs or other libs. 

Just FYI.
My implementation for mlx4/5 doesn't need to have a separate registration for
DMA by rte_dma_map() as long as it is included in the memseg list. Registration
is done only if Lkey lookup misses and only mem free event is taken to release
it. From my end, the reason why we wanted to have a generic DMA registration was
because some people doesn't want to use the new API to make the ext mem included
in the memseg list but want to simply call the API for DMA registration.

In a nutshell, mlx4/5 needs users use either rte_extmem_register() or
rte_dma_map(). However, it is no problem to call both.


Thanks,
Yongseok

>> [1]
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
>> ils.dpdk.org%2Farchives%2Fdev%2F2018-
>> November%2F118175.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
>> m%7C007a9234feaf42c82f6508d656015eb1%7Ca652971c7d2e4d9ba6a4d1492
>> 56f461b%7C0%7C0%7C636790961244424277&amp;sdata=YqwcPEEhJM3I7Toe
>> Ne%2BGcbeo%2FmPbYEnNFckoA7ES2Hg%3D&amp;reserved=0
>> 
>> Note: at the time of this writing, there's no release notes
>>      template, so no release notes updates in this patchset.
>>      They will be added in later revisions.
>> 
>> Anatoly Burakov (4):
>>  malloc: separate creating memseg list and malloc heap
>>  malloc: separate destroying memseg list and heap data
>>  mem: allow registering external memory areas
>>  mem: allow usage of non-heap external memory in multiprocess
>> 
>> .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
>> lib/librte_eal/common/eal_common_memory.c     | 116
>> +++++++++++++++++
>> lib/librte_eal/common/include/rte_memory.h    | 122
>> ++++++++++++++++++
>> lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
>> lib/librte_eal/common/malloc_heap.h           |  15 ++-
>> lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
>> lib/librte_eal/rte_eal_version.map            |   4 +
>> 7 files changed, 434 insertions(+), 105 deletions(-)
>> 
>> --
>> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
  2018-12-02 23:28   ` Yongseok Koh
@ 2018-12-03 10:23     ` Burakov, Anatoly
  2018-12-12 12:55       ` Yongseok Koh
  0 siblings, 1 reply; 29+ messages in thread
From: Burakov, Anatoly @ 2018-12-03 10:23 UTC (permalink / raw)
  To: Yongseok Koh, Shahaf Shuler; +Cc: dev, Thomas Monjalon, shreyansh.jain

On 02-Dec-18 11:28 PM, Yongseok Koh wrote:
> 
>> On Dec 1, 2018, at 9:48 PM, Shahaf Shuler <shahafs@mellanox.com> wrote:
>>
>> Hi Anatoly,
>>
>> Thursday, November 29, 2018 3:49 PM, Anatoly Burakov:
>>> Subject: [PATCH 0/4] Allow using external memory without malloc
>>>
>>> Currently, the only way to use externally allocated memory is through
>>> rte_malloc API's. While this is fine for a lot of use cases, it may not be suitable
>>> for certain other use cases like manual memory management, etc.
>>>
>>> This patchset adds another API to register memory segments with DPDK (so
>>> that API's like ``rte_mem_virt2memseg`` could be relied on by PMD's and
>>> such), but not create a malloc heap out of them.
>>>
>>> Aside from the obvious (not adding memory to a heap), the other major
>>> difference between this API and the ``rte_malloc_heap_*`` external memory
>>> functions is the fact that no DMA mapping is performed automatically.
>>>
>>> This really draws a line in the sand, and there are now two ways of doing
>>> things - do everything automatically (using the ``rte_malloc_heap_*`` API's),
>>> or do everything manually (``rte_extmem_*`` and future DMA mapping API
>>> [1] that would replace ``rte_vfio_dma_map``). This way, the consistency of
>>> API is kept, and flexibility is also allowed.
>>>
>>
>> As you know I like the idea.
>> One question though, do you see a use case for application to have externally allocated memory which needs to be registered to the DPDK subsystem however not being used for DMA?
>> My only guess would be so helper libraries which requires the memory allocation from user (however it doesn't seems like a good API).
>>
>> If no use case, maybe it is better to merge between the two (rte_extmem_* and rte_dma_map) to have a single call for app to register and DMA map the memory. The rte_mem_virt2memseg is not something application needs to understand, it is used internally by PMDs or other libs.
> 
> Just FYI.
> My implementation for mlx4/5 doesn't need to have a separate registration for
> DMA by rte_dma_map() as long as it is included in the memseg list. Registration
> is done only if Lkey lookup misses and only mem free event is taken to release
> it. From my end, the reason why we wanted to have a generic DMA registration was
> because some people doesn't want to use the new API to make the ext mem included
> in the memseg list but want to simply call the API for DMA registration.
> 
> In a nutshell, mlx4/5 needs users use either rte_extmem_register() or
> rte_dma_map(). However, it is no problem to call both.

It would be good to create a segment when using rte_dma_map(). 
Unfortunately, that's not realistic :)

Registering memory within DPDK does not necessarily have to be performed 
by the primary process - whichever process that wants to create the 
table, can do so, and later processes have to attach to the memory area. 
There's also no way to know if memory segment can be attached to - this 
is a question only application can answer.

In other words, there's no way to combine rte_dma_map() and 
rte_extmem_register() into one call and keep multiprocess support.

> 
> 
> Thanks,
> Yongseok
> 
>>> [1]
>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
>>> ils.dpdk.org%2Farchives%2Fdev%2F2018-
>>> November%2F118175.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
>>> m%7C007a9234feaf42c82f6508d656015eb1%7Ca652971c7d2e4d9ba6a4d1492
>>> 56f461b%7C0%7C0%7C636790961244424277&amp;sdata=YqwcPEEhJM3I7Toe
>>> Ne%2BGcbeo%2FmPbYEnNFckoA7ES2Hg%3D&amp;reserved=0
>>>
>>> Note: at the time of this writing, there's no release notes
>>>       template, so no release notes updates in this patchset.
>>>       They will be added in later revisions.
>>>
>>> Anatoly Burakov (4):
>>>   malloc: separate creating memseg list and malloc heap
>>>   malloc: separate destroying memseg list and heap data
>>>   mem: allow registering external memory areas
>>>   mem: allow usage of non-heap external memory in multiprocess
>>>
>>> .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
>>> lib/librte_eal/common/eal_common_memory.c     | 116
>>> +++++++++++++++++
>>> lib/librte_eal/common/include/rte_memory.h    | 122
>>> ++++++++++++++++++
>>> lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
>>> lib/librte_eal/common/malloc_heap.h           |  15 ++-
>>> lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
>>> lib/librte_eal/rte_eal_version.map            |   4 +
>>> 7 files changed, 434 insertions(+), 105 deletions(-)
>>>
>>> --
>>> 2.17.1
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
  2018-12-03 10:23     ` Burakov, Anatoly
@ 2018-12-12 12:55       ` Yongseok Koh
  2018-12-12 13:17         ` Burakov, Anatoly
  0 siblings, 1 reply; 29+ messages in thread
From: Yongseok Koh @ 2018-12-12 12:55 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: Shahaf Shuler, dev, Thomas Monjalon, shreyansh.jain

On Mon, Dec 03, 2018 at 10:23:03AM +0000, Burakov, Anatoly wrote:
> On 02-Dec-18 11:28 PM, Yongseok Koh wrote:
> > 
> > > On Dec 1, 2018, at 9:48 PM, Shahaf Shuler <shahafs@mellanox.com> wrote:
> > > 
> > > Hi Anatoly,
> > > 
> > > Thursday, November 29, 2018 3:49 PM, Anatoly Burakov:
> > > > Subject: [PATCH 0/4] Allow using external memory without malloc
> > > > 
> > > > Currently, the only way to use externally allocated memory is through
> > > > rte_malloc API's. While this is fine for a lot of use cases, it may not be suitable
> > > > for certain other use cases like manual memory management, etc.
> > > > 
> > > > This patchset adds another API to register memory segments with DPDK (so
> > > > that API's like ``rte_mem_virt2memseg`` could be relied on by PMD's and
> > > > such), but not create a malloc heap out of them.
> > > > 
> > > > Aside from the obvious (not adding memory to a heap), the other major
> > > > difference between this API and the ``rte_malloc_heap_*`` external memory
> > > > functions is the fact that no DMA mapping is performed automatically.
> > > > 
> > > > This really draws a line in the sand, and there are now two ways of doing
> > > > things - do everything automatically (using the ``rte_malloc_heap_*`` API's),
> > > > or do everything manually (``rte_extmem_*`` and future DMA mapping API
> > > > [1] that would replace ``rte_vfio_dma_map``). This way, the consistency of
> > > > API is kept, and flexibility is also allowed.
> > > > 
> > > 
> > > As you know I like the idea.
> > > One question though, do you see a use case for application to have externally allocated memory which needs to be registered to the DPDK subsystem however not being used for DMA?
> > > My only guess would be so helper libraries which requires the memory allocation from user (however it doesn't seems like a good API).
> > > 
> > > If no use case, maybe it is better to merge between the two (rte_extmem_* and rte_dma_map) to have a single call for app to register and DMA map the memory. The rte_mem_virt2memseg is not something application needs to understand, it is used internally by PMDs or other libs.
> > 
> > Just FYI.
> > My implementation for mlx4/5 doesn't need to have a separate registration for
> > DMA by rte_dma_map() as long as it is included in the memseg list. Registration
> > is done only if Lkey lookup misses and only mem free event is taken to release
> > it. From my end, the reason why we wanted to have a generic DMA registration was
> > because some people doesn't want to use the new API to make the ext mem included
> > in the memseg list but want to simply call the API for DMA registration.
> > 
> > In a nutshell, mlx4/5 needs users use either rte_extmem_register() or
> > rte_dma_map(). However, it is no problem to call both.
> 
> It would be good to create a segment when using rte_dma_map().
> Unfortunately, that's not realistic :)
> 
> Registering memory within DPDK does not necessarily have to be performed by
> the primary process - whichever process that wants to create the table, can
> do so, and later processes have to attach to the memory area. There's also
> no way to know if memory segment can be attached to - this is a question
> only application can answer.
> 
> In other words, there's no way to combine rte_dma_map() and
> rte_extmem_register() into one call and keep multiprocess support.

Sorry for late reply. I was away for a while.

I understood your point that rte_dma_map() can't create a segment but isn't the
opposite possible? I still have a question about
rte_extmem_register/unregister/attach/detach(). Why don't these APIs generate
memory events? Do you define the memory events are limited to memories for
malloc? What if some app wants to know the events even if it is extmem? What
makes difference between two types of extmem (one for malloc heap and the other
for just memseg) in generating the events?

I've reviewed your patches and all look good :-) But it is still unclear to me.


Thanks,
Yongseok

> > > > [1]
> > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
> > > > ils.dpdk.org%2Farchives%2Fdev%2F2018-
> > > > November%2F118175.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
> > > > m%7C007a9234feaf42c82f6508d656015eb1%7Ca652971c7d2e4d9ba6a4d1492
> > > > 56f461b%7C0%7C0%7C636790961244424277&amp;sdata=YqwcPEEhJM3I7Toe
> > > > Ne%2BGcbeo%2FmPbYEnNFckoA7ES2Hg%3D&amp;reserved=0
> > > > 
> > > > Note: at the time of this writing, there's no release notes
> > > >       template, so no release notes updates in this patchset.
> > > >       They will be added in later revisions.
> > > > 
> > > > Anatoly Burakov (4):
> > > >   malloc: separate creating memseg list and malloc heap
> > > >   malloc: separate destroying memseg list and heap data
> > > >   mem: allow registering external memory areas
> > > >   mem: allow usage of non-heap external memory in multiprocess
> > > > 
> > > > .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
> > > > lib/librte_eal/common/eal_common_memory.c     | 116
> > > > +++++++++++++++++
> > > > lib/librte_eal/common/include/rte_memory.h    | 122
> > > > ++++++++++++++++++
> > > > lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
> > > > lib/librte_eal/common/malloc_heap.h           |  15 ++-
> > > > lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
> > > > lib/librte_eal/rte_eal_version.map            |   4 +
> > > > 7 files changed, 434 insertions(+), 105 deletions(-)
> > > > 
> > > > --
> > > > 2.17.1
> > 
> > 
> 
> 
> -- 
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc
  2018-12-12 12:55       ` Yongseok Koh
@ 2018-12-12 13:17         ` Burakov, Anatoly
  0 siblings, 0 replies; 29+ messages in thread
From: Burakov, Anatoly @ 2018-12-12 13:17 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Shahaf Shuler, dev, Thomas Monjalon, shreyansh.jain

On 12-Dec-18 12:55 PM, Yongseok Koh wrote:
> On Mon, Dec 03, 2018 at 10:23:03AM +0000, Burakov, Anatoly wrote:
>> On 02-Dec-18 11:28 PM, Yongseok Koh wrote:
>>>
>>>> On Dec 1, 2018, at 9:48 PM, Shahaf Shuler <shahafs@mellanox.com> wrote:
>>>>
>>>> Hi Anatoly,
>>>>
>>>> Thursday, November 29, 2018 3:49 PM, Anatoly Burakov:
>>>>> Subject: [PATCH 0/4] Allow using external memory without malloc
>>>>>
>>>>> Currently, the only way to use externally allocated memory is through
>>>>> rte_malloc API's. While this is fine for a lot of use cases, it may not be suitable
>>>>> for certain other use cases like manual memory management, etc.
>>>>>
>>>>> This patchset adds another API to register memory segments with DPDK (so
>>>>> that API's like ``rte_mem_virt2memseg`` could be relied on by PMD's and
>>>>> such), but not create a malloc heap out of them.
>>>>>
>>>>> Aside from the obvious (not adding memory to a heap), the other major
>>>>> difference between this API and the ``rte_malloc_heap_*`` external memory
>>>>> functions is the fact that no DMA mapping is performed automatically.
>>>>>
>>>>> This really draws a line in the sand, and there are now two ways of doing
>>>>> things - do everything automatically (using the ``rte_malloc_heap_*`` API's),
>>>>> or do everything manually (``rte_extmem_*`` and future DMA mapping API
>>>>> [1] that would replace ``rte_vfio_dma_map``). This way, the consistency of
>>>>> API is kept, and flexibility is also allowed.
>>>>>
>>>>
>>>> As you know I like the idea.
>>>> One question though, do you see a use case for application to have externally allocated memory which needs to be registered to the DPDK subsystem however not being used for DMA?
>>>> My only guess would be so helper libraries which requires the memory allocation from user (however it doesn't seems like a good API).
>>>>
>>>> If no use case, maybe it is better to merge between the two (rte_extmem_* and rte_dma_map) to have a single call for app to register and DMA map the memory. The rte_mem_virt2memseg is not something application needs to understand, it is used internally by PMDs or other libs.
>>>
>>> Just FYI.
>>> My implementation for mlx4/5 doesn't need to have a separate registration for
>>> DMA by rte_dma_map() as long as it is included in the memseg list. Registration
>>> is done only if Lkey lookup misses and only mem free event is taken to release
>>> it. From my end, the reason why we wanted to have a generic DMA registration was
>>> because some people doesn't want to use the new API to make the ext mem included
>>> in the memseg list but want to simply call the API for DMA registration.
>>>
>>> In a nutshell, mlx4/5 needs users use either rte_extmem_register() or
>>> rte_dma_map(). However, it is no problem to call both.
>>
>> It would be good to create a segment when using rte_dma_map().
>> Unfortunately, that's not realistic :)
>>
>> Registering memory within DPDK does not necessarily have to be performed by
>> the primary process - whichever process that wants to create the table, can
>> do so, and later processes have to attach to the memory area. There's also
>> no way to know if memory segment can be attached to - this is a question
>> only application can answer.
>>
>> In other words, there's no way to combine rte_dma_map() and
>> rte_extmem_register() into one call and keep multiprocess support.
> 
> Sorry for late reply. I was away for a while.
> 
> I understood your point that rte_dma_map() can't create a segment but isn't the
> opposite possible? I still have a question about
> rte_extmem_register/unregister/attach/detach(). Why don't these APIs generate
> memory events? Do you define the memory events are limited to memories for
> malloc? What if some app wants to know the events even if it is extmem? What
> makes difference between two types of extmem (one for malloc heap and the other
> for just memseg) in generating the events?
> 
> I've reviewed your patches and all look good :-) But it is still unclear to me.

Hi Yongseok,

Idealistically speaking, my view is, if you want the luxury of DPDK 
doing everything for you, use malloc heaps. If you don't - you're on 
your own :) Any callbacks etc. that you might want to get if you're 
*not* using malloc, is not really my problem - it's yours, because *you* 
don't want to use built-in DPDK facilities, for whatever reason.

Pragmatically speaking, right now the reason to not do so is to avoid 
that memory being automatically mapped for DMA due to VFIO currently 
using callbacks mechanism to subscribe to notifications. When they are 
decoupled - we can talk about making it so that registered external 
memory still triggers callbacks (although i do not see why, to be honest).

> 
> 
> Thanks,
> Yongseok
> 
>>>>> [1]
>>>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fma
>>>>> ils.dpdk.org%2Farchives%2Fdev%2F2018-
>>>>> November%2F118175.html&amp;data=02%7C01%7Cshahafs%40mellanox.co
>>>>> m%7C007a9234feaf42c82f6508d656015eb1%7Ca652971c7d2e4d9ba6a4d1492
>>>>> 56f461b%7C0%7C0%7C636790961244424277&amp;sdata=YqwcPEEhJM3I7Toe
>>>>> Ne%2BGcbeo%2FmPbYEnNFckoA7ES2Hg%3D&amp;reserved=0
>>>>>
>>>>> Note: at the time of this writing, there's no release notes
>>>>>        template, so no release notes updates in this patchset.
>>>>>        They will be added in later revisions.
>>>>>
>>>>> Anatoly Burakov (4):
>>>>>    malloc: separate creating memseg list and malloc heap
>>>>>    malloc: separate destroying memseg list and heap data
>>>>>    mem: allow registering external memory areas
>>>>>    mem: allow usage of non-heap external memory in multiprocess
>>>>>
>>>>> .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
>>>>> lib/librte_eal/common/eal_common_memory.c     | 116
>>>>> +++++++++++++++++
>>>>> lib/librte_eal/common/include/rte_memory.h    | 122
>>>>> ++++++++++++++++++
>>>>> lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
>>>>> lib/librte_eal/common/malloc_heap.h           |  15 ++-
>>>>> lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
>>>>> lib/librte_eal/rte_eal_version.map            |   4 +
>>>>> 7 files changed, 434 insertions(+), 105 deletions(-)
>>>>>
>>>>> --
>>>>> 2.17.1
>>>
>>>
>>
>>
>> -- 
>> Thanks,
>> Anatoly
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
@ 2018-12-14  9:33   ` Yongseok Koh
  0 siblings, 0 replies; 29+ messages in thread
From: Yongseok Koh @ 2018-12-14  9:33 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Shahaf Shuler, Thomas Monjalon, shreyansh.jain

On Thu, Nov 29, 2018 at 01:48:32PM +0000, Anatoly Burakov wrote:
> Currently, creating external malloc heap involves also creating
> a memseg list backing that malloc heap. We need to have them as
> separate functions, to allow creating memseg lists without
> creating a malloc heap.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

>  lib/librte_eal/common/malloc_heap.c | 34 ++++++++++++++++++-----------
>  lib/librte_eal/common/malloc_heap.h |  9 ++++++--
>  lib/librte_eal/common/rte_malloc.c  | 11 ++++++++--
>  3 files changed, 37 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
> index c6a6d4f6b..25693481f 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -1095,9 +1095,10 @@ destroy_seg(struct malloc_elem *elem, size_t len)
>  	return 0;
>  }
>  
> -int
> -malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
> -		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
> +struct rte_memseg_list *
> +malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz, const char *seg_name,
> +		unsigned int socket_id)
>  {
>  	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>  	char fbarray_name[RTE_FBARRAY_NAME_LEN];
> @@ -1117,17 +1118,17 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
>  	if (msl == NULL) {
>  		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
>  		rte_errno = ENOSPC;
> -		return -1;
> +		return NULL;
>  	}
>  
>  	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
> -			heap->name, va_addr);
> +			seg_name, va_addr);
>  
>  	/* create the backing fbarray */
>  	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
>  			sizeof(struct rte_memseg)) < 0) {
>  		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
> -		return -1;
> +		return NULL;
>  	}
>  	arr = &msl->memseg_arr;
>  
> @@ -1143,32 +1144,39 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
>  		ms->len = page_sz;
>  		ms->nchannel = rte_memory_get_nchannel();
>  		ms->nrank = rte_memory_get_nrank();
> -		ms->socket_id = heap->socket_id;
> +		ms->socket_id = socket_id;
>  	}
>  
>  	/* set up the memseg list */
>  	msl->base_va = va_addr;
>  	msl->page_sz = page_sz;
> -	msl->socket_id = heap->socket_id;
> +	msl->socket_id = socket_id;
>  	msl->len = seg_len;
>  	msl->version = 0;
>  	msl->external = 1;
>  
> +	return msl;
> +}
> +
> +int
> +malloc_heap_add_external_memory(struct malloc_heap *heap,
> +		struct rte_memseg_list *msl)
> +{
>  	/* erase contents of new memory */
> -	memset(va_addr, 0, seg_len);
> +	memset(msl->base_va, 0, msl->len);
>  
>  	/* now, add newly minted memory to the malloc heap */
> -	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
> +	malloc_heap_add_memory(heap, msl, msl->base_va, msl->len);
>  
> -	heap->total_size += seg_len;
> +	heap->total_size += msl->len;
>  
>  	/* all done! */
>  	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
> -			heap->name, va_addr);
> +			heap->name, msl->base_va);
>  
>  	/* notify all subscribers that a new memory area has been added */
>  	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
> -			va_addr, seg_len);
> +			msl->base_va, msl->len);
>  
>  	return 0;
>  }
> diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
> index e48996d52..255a315b8 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -39,9 +39,14 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
>  int
>  malloc_heap_destroy(struct malloc_heap *heap);
>  
> +struct rte_memseg_list *
> +malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz, const char *seg_name,
> +		unsigned int socket_id);
> +
>  int
> -malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
> -		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
> +malloc_heap_add_external_memory(struct malloc_heap *heap,
> +		struct rte_memseg_list *msl);
>  
>  int
>  malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
> diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
> index 0da5ad5e8..66bfe63c3 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -340,6 +340,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
>  {
>  	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>  	struct malloc_heap *heap = NULL;
> +	struct rte_memseg_list *msl;
>  	unsigned int n;
>  	int ret;
>  
> @@ -373,9 +374,15 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
>  		goto unlock;
>  	}
>  
> +	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n, page_sz,
> +			heap_name, heap->socket_id);
> +	if (msl == NULL) {
> +		ret = -1;
> +		goto unlock;
> +	}
> +
>  	rte_spinlock_lock(&heap->lock);
> -	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
> -			page_sz);
> +	ret = malloc_heap_add_external_memory(heap, msl);
>  	rte_spinlock_unlock(&heap->lock);
>  
>  unlock:
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
@ 2018-12-14  9:34   ` Yongseok Koh
  0 siblings, 0 replies; 29+ messages in thread
From: Yongseok Koh @ 2018-12-14  9:34 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Shahaf Shuler, Thomas Monjalon, shreyansh.jain

On Thu, Nov 29, 2018 at 01:48:33PM +0000, Anatoly Burakov wrote:
> Currently, destroying external heap chunk and its memseg list is
> part of one process. When we will gain the ability to unregister
> external memory from DPDK that doesn't have any heap structures
> associated with it, we need to be able to find and destroy
> memseg lists as well as heap data separately.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

>  lib/librte_eal/common/malloc_heap.c |  70 +++++++++++++++----
>  lib/librte_eal/common/malloc_heap.h |   6 ++
>  lib/librte_eal/common/rte_malloc.c  | 104 ++++++++++------------------
>  3 files changed, 102 insertions(+), 78 deletions(-)
> 
> diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
> index 25693481f..fa0cb0799 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -1067,12 +1067,9 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
>  }
>  
>  static int
> -destroy_seg(struct malloc_elem *elem, size_t len)
> +destroy_elem(struct malloc_elem *elem, size_t len)
>  {
>  	struct malloc_heap *heap = elem->heap;
> -	struct rte_memseg_list *msl;
> -
> -	msl = elem->msl;
>  
>  	/* notify all subscribers that a memory area is going to be removed */
>  	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
> @@ -1085,13 +1082,6 @@ destroy_seg(struct malloc_elem *elem, size_t len)
>  
>  	memset(elem, 0, sizeof(*elem));
>  
> -	/* destroy the fbarray backing this memory */
> -	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
> -		return -1;
> -
> -	/* reset the memseg list */
> -	memset(msl, 0, sizeof(*msl));
> -
>  	return 0;
>  }
>  
> @@ -1158,6 +1148,62 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
>  	return msl;
>  }
>  
> +struct extseg_walk_arg {
> +	void *va_addr;
> +	size_t len;
> +	struct rte_memseg_list *msl;
> +};
> +
> +static int
> +extseg_walk(const struct rte_memseg_list *msl, void *arg)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	struct extseg_walk_arg *wa = arg;
> +
> +	if (msl->base_va == wa->va_addr && msl->len == wa->len) {
> +		unsigned int found_idx;
> +
> +		/* msl is const */
> +		found_idx = msl - mcfg->memsegs;
> +		wa->msl = &mcfg->memsegs[found_idx];
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +struct rte_memseg_list *
> +malloc_heap_find_external_seg(void *va_addr, size_t len)
> +{
> +	struct extseg_walk_arg wa;
> +	int res;
> +
> +	wa.va_addr = va_addr;
> +	wa.len = len;
> +
> +	res = rte_memseg_list_walk_thread_unsafe(extseg_walk, &wa);
> +
> +	if (res != 1) {
> +		/* 0 means nothing was found, -1 shouldn't happen */
> +		if (res == 0)
> +			rte_errno = ENOENT;
> +		return NULL;
> +	}
> +	return wa.msl;
> +}
> +
> +int
> +malloc_heap_destroy_external_seg(struct rte_memseg_list *msl)
> +{
> +	/* destroy the fbarray backing this memory */
> +	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
> +		return -1;
> +
> +	/* reset the memseg list */
> +	memset(msl, 0, sizeof(*msl));
> +
> +	return 0;
> +}
> +
>  int
>  malloc_heap_add_external_memory(struct malloc_heap *heap,
>  		struct rte_memseg_list *msl)
> @@ -1206,7 +1252,7 @@ malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
>  		rte_errno = EBUSY;
>  		return -1;
>  	}
> -	return destroy_seg(elem, len);
> +	return destroy_elem(elem, len);
>  }
>  
>  int
> diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
> index 255a315b8..ca9ff666f 100644
> --- a/lib/librte_eal/common/malloc_heap.h
> +++ b/lib/librte_eal/common/malloc_heap.h
> @@ -44,6 +44,12 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
>  		unsigned int n_pages, size_t page_sz, const char *seg_name,
>  		unsigned int socket_id);
>  
> +struct rte_memseg_list *
> +malloc_heap_find_external_seg(void *va_addr, size_t len);
> +
> +int
> +malloc_heap_destroy_external_seg(struct rte_memseg_list *msl);
> +
>  int
>  malloc_heap_add_external_memory(struct malloc_heap *heap,
>  		struct rte_memseg_list *msl);
> diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
> index 66bfe63c3..9a82e3386 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -396,6 +396,7 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
>  {
>  	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>  	struct malloc_heap *heap = NULL;
> +	struct rte_memseg_list *msl;
>  	int ret;
>  
>  	if (heap_name == NULL || va_addr == NULL || len == 0 ||
> @@ -420,9 +421,19 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
>  		goto unlock;
>  	}
>  
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
> +		ret = -1;
> +		goto unlock;
> +	}
> +
>  	rte_spinlock_lock(&heap->lock);
>  	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
>  	rte_spinlock_unlock(&heap->lock);
> +	if (ret != 0)
> +		goto unlock;
> +
> +	ret = malloc_heap_destroy_external_seg(msl);
>  
>  unlock:
>  	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> @@ -430,63 +441,12 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
>  	return ret;
>  }
>  
> -struct sync_mem_walk_arg {
> -	void *va_addr;
> -	size_t len;
> -	int result;
> -	bool attach;
> -};
> -
> -static int
> -sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
> -{
> -	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> -	struct sync_mem_walk_arg *wa = arg;
> -	size_t len = msl->page_sz * msl->memseg_arr.len;
> -
> -	if (msl->base_va == wa->va_addr &&
> -			len == wa->len) {
> -		struct rte_memseg_list *found_msl;
> -		int msl_idx, ret;
> -
> -		/* msl is const */
> -		msl_idx = msl - mcfg->memsegs;
> -		found_msl = &mcfg->memsegs[msl_idx];
> -
> -		if (wa->attach) {
> -			ret = rte_fbarray_attach(&found_msl->memseg_arr);
> -		} else {
> -			/* notify all subscribers that a memory area is about to
> -			 * be removed
> -			 */
> -			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
> -					msl->base_va, msl->len);
> -			ret = rte_fbarray_detach(&found_msl->memseg_arr);
> -		}
> -
> -		if (ret < 0) {
> -			wa->result = -rte_errno;
> -		} else {
> -			/* notify all subscribers that a new memory area was
> -			 * added
> -			 */
> -			if (wa->attach)
> -				eal_memalloc_mem_event_notify(
> -						RTE_MEM_EVENT_ALLOC,
> -						msl->base_va, msl->len);
> -			wa->result = 0;
> -		}
> -		return 1;
> -	}
> -	return 0;
> -}
> -
>  static int
>  sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
>  {
>  	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>  	struct malloc_heap *heap = NULL;
> -	struct sync_mem_walk_arg wa;
> +	struct rte_memseg_list *msl;
>  	int ret;
>  
>  	if (heap_name == NULL || va_addr == NULL || len == 0 ||
> @@ -513,23 +473,35 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
>  	}
>  
>  	/* find corresponding memseg list to sync to */
> -	wa.va_addr = va_addr;
> -	wa.len = len;
> -	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
> -	wa.attach = attach;
> -
> -	/* we're already holding a read lock */
> -	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
> -
> -	if (wa.result < 0) {
> -		rte_errno = -wa.result;
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
>  		ret = -1;
> -	} else {
> -		/* notify all subscribers that a new memory area was added */
> -		if (attach)
> +		goto unlock;
> +	}
> +
> +	if (attach) {
> +		ret = rte_fbarray_attach(&msl->memseg_arr);
> +		if (ret == 0) {
> +			/* notify all subscribers that a new memory area was
> +			 * added.
> +			 */
>  			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
>  					va_addr, len);
> -		ret = 0;
> +		} else {
> +			ret = -1;
> +			goto unlock;
> +		}
> +	} else {
> +		/* notify all subscribers that a memory area is about to
> +		 * be removed.
> +		 */
> +		eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
> +				msl->base_va, msl->len);
> +		ret = rte_fbarray_detach(&msl->memseg_arr);
> +		if (ret < 0) {
> +			ret = -1;
> +			goto unlock;
> +		}
>  	}
>  unlock:
>  	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
@ 2018-12-14  9:55   ` Yongseok Koh
  2018-12-14 11:03     ` Burakov, Anatoly
  0 siblings, 1 reply; 29+ messages in thread
From: Yongseok Koh @ 2018-12-14  9:55 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, John McNamara, Marko Kovacevic, Shahaf Shuler,
	Thomas Monjalon, shreyansh.jain

On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
> The general use-case of using external memory is well covered by
> existing external memory API's. However, certain use cases require
> manual management of externally allocated memory areas, so this
> memory should not be added to the heap. It should, however, be
> added to DPDK's internal structures, so that API's like
> ``rte_virt2memseg`` would work on such external memory segments.
> 
> This commit adds such an API to DPDK. The new functions will allow
> to register and unregister externally allocated memory areas, as
> well as documentation for them.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>  lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>  lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>  lib/librte_eal/rte_eal_version.map            |  2 +
>  4 files changed, 189 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 8b5d050c7..d7799b626 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>  Support for Externally Allocated Memory
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> -It is possible to use externally allocated memory in DPDK, using a set of malloc
> -heap API's. Support for externally allocated memory is implemented through
> -overloading the socket ID - externally allocated heaps will have socket ID's
> -that would be considered invalid under normal circumstances. Requesting an
> -allocation to take place from a specified externally allocated memory is a
> -matter of supplying the correct socket ID to DPDK allocator, either directly
> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
> -structure-specific allocation API's such as ``rte_ring_create``).
> +It is possible to use externally allocated memory in DPDK. There are two ways in
> +which using externally allocated memory can work: the malloc heap API's, and
> +manual memory management.
>  
> -Since there is no way DPDK can verify whether memory are is available or valid,
> -this responsibility falls on the shoulders of the user. All multiprocess
> ++ Using heap API's for externally allocated memory
> +
> +Using using a set of malloc heap API's is the recommended way to use externally
> +allocated memory in DPDK. In this way, support for externally allocated memory
> +is implemented through overloading the socket ID - externally allocated heaps
> +will have socket ID's that would be considered invalid under normal
> +circumstances. Requesting an allocation to take place from a specified
> +externally allocated memory is a matter of supplying the correct socket ID to
> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
> +indirectly (through data structure-specific allocation API's such as
> +``rte_ring_create``). Using these API's also ensures that mapping of externally
> +allocated memory for DMA is also performed on any memory segment that is added
> +to a DPDK malloc heap.
> +
> +Since there is no way DPDK can verify whether memory is available or valid, this
> +responsibility falls on the shoulders of the user. All multiprocess
>  synchronization is also user's responsibility, as well as ensuring  that all
>  calls to add/attach/detach/remove memory are done in the correct order. It is
>  not required to attach to a memory area in all processes - only attach to memory
> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>  For more information, please refer to ``rte_malloc`` API documentation,
>  specifically the ``rte_malloc_heap_*`` family of function calls.
>  
> ++ Using externally allocated memory without DPDK API's
> +
> +While using heap API's is the recommended method of using externally allocated
> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
> +is undesirable - for example, when manual memory management is performed on an
> +externally allocated area. To support use cases where externally allocated
> +memory will not be used as part of normal DPDK workflow, there is also another
> +set of API's under the ``rte_extmem_*`` namespace.
> +
> +These API's are (as their name implies) intended to allow registering or
> +unregistering externally allocated memory to/from DPDK's internal page table, to
> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
> +memory. Memory added this way will not be available for any regular DPDK
> +allocators; DPDK will leave this memory for the user application to manage.
> +
> +The expected workflow is as follows:
> +
> +* Get a pointer to memory area
> +* Register memory within DPDK
> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
> +      unavailable
> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
> +* Use the memory area in your application
> +* If memory area is no longer needed, it can be unregistered
> +    - If the area was mapped for DMA, unmapping must be performed before
> +      unregistering memory
> +
> +Since these externally allocated memory areas will not be managed by DPDK, it is
> +therefore up to the user application to decide how to use them and what to do
> +with them once they're registered.
> +
>  Per-lcore and Shared Variables
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index d47ea4938..a2e085ae8 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -24,6 +24,7 @@
>  #include "eal_memalloc.h"
>  #include "eal_private.h"
>  #include "eal_internal_cfg.h"
> +#include "malloc_heap.h"
>  
>  /*
>   * Try to mmap *size bytes in /dev/zero. If it is successful, return the
> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>  	return ret;
>  }
>  
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	unsigned int socket_id;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
> +			!rte_is_power_of_2(page_sz) ||
> +			RTE_ALIGN(len, page_sz) != len) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}

Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
have it either... Also you might want to add it to documentation that
granularity of these registrations is a page.

Otherwise,

Acked-by: Yongseok Koh <yskoh@mellanox.com>
Thanks

> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* make sure the segment doesn't already exist */
> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
> +		rte_errno = EEXIST;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* get next available socket ID */
> +	socket_id = mcfg->next_socket_id;
> +	if (socket_id > INT32_MAX) {
> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
> +		rte_errno = ENOSPC;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* we can create a new memseg */
> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
> +			page_sz, "extmem", socket_id) == NULL) {
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	/* memseg list successfully created - increment next socket ID */
> +	mcfg->next_socket_id++;
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	struct rte_memseg_list *msl;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* find our segment */
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
> +		rte_errno = ENOENT;
> +		ret = -1;
> +		goto unlock;
> +	}
> +
> +	ret = malloc_heap_destroy_external_seg(msl);
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
>  /* init memory subsystem */
>  int
>  rte_eal_memory_init(void)
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> index d970825df..4a43c1a9e 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -423,6 +423,69 @@ int __rte_experimental
>  rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>  		size_t *offset);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Register external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA mapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to register
> + * @param len
> + *   Length of virtual area to register
> + * @param iova_addrs
> + *   Array of page IOVA addresses corresponding to each page in this memory
> + *   area. Can be NULL, in which case page IOVA addresses will be set to
> + *   RTE_BAD_IOVA.
> + * @param n_pages
> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
> + *   is NULL.
> + * @param page_sz
> + *   Page size of the underlying memory
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     EEXIST - memory chunk is already registered
> + *     ENOSPC - no more space in internal config to store a new memory chunk
> + */
> +int __rte_experimental
> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> +		unsigned int n_pages, size_t page_sz);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Unregister external memory chunk with DPDK.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA unmapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to unregister
> + * @param len
> + *   Length of virtual area to unregister
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     ENOENT - memory chunk was not found
> + */
> +int __rte_experimental
> +rte_extmem_unregister(void *va_addr, size_t len);
> +
>  /**
>   * Dump the physical memory layout to a file.
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 3fe78260d..593691a14 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_eal_cleanup;
> +	rte_extmem_register;
> +	rte_extmem_unregister;
>  	rte_fbarray_attach;
>  	rte_fbarray_destroy;
>  	rte_fbarray_detach;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess
  2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
@ 2018-12-14  9:56   ` Yongseok Koh
  0 siblings, 0 replies; 29+ messages in thread
From: Yongseok Koh @ 2018-12-14  9:56 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, John McNamara, Marko Kovacevic, Shahaf Shuler,
	Thomas Monjalon, shreyansh.jain

On Thu, Nov 29, 2018 at 01:48:35PM +0000, Anatoly Burakov wrote:
> Add multiprocess support for externally allocated memory areas that
> are not added to DPDK heap (and add relevant doc sections).
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks

>  .../prog_guide/env_abstraction_layer.rst      |  3 +
>  lib/librte_eal/common/eal_common_memory.c     | 42 +++++++++++++
>  lib/librte_eal/common/include/rte_memory.h    | 59 +++++++++++++++++++
>  lib/librte_eal/rte_eal_version.map            |  2 +
>  4 files changed, 106 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index d7799b626..b0491bf2d 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -276,11 +276,14 @@ The expected workflow is as follows:
>  * Register memory within DPDK
>      - If IOVA table is not specified, IOVA addresses will be assumed to be
>        unavailable
> +    - Other processes must attach to the memory area before they can use it
>  * Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>  * Use the memory area in your application
>  * If memory area is no longer needed, it can be unregistered
>      - If the area was mapped for DMA, unmapping must be performed before
>        unregistering memory
> +    - Other processes must detach from the memory area before it can be
> +      unregistered
>  
>  Since these externally allocated memory areas will not be managed by DPDK, it is
>  therefore up to the user application to decide how to use them and what to do
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index a2e085ae8..67b445c31 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -849,6 +849,48 @@ rte_extmem_unregister(void *va_addr, size_t len)
>  	return ret;
>  }
>  
> +static int
> +sync_memory(void *va_addr, size_t len, bool attach)
> +{
> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> +	struct rte_memseg_list *msl;
> +	int ret = 0;
> +
> +	if (va_addr == NULL || len == 0) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
> +
> +	/* find our segment */
> +	msl = malloc_heap_find_external_seg(va_addr, len);
> +	if (msl == NULL) {
> +		rte_errno = ENOENT;
> +		ret = -1;
> +		goto unlock;
> +	}
> +	if (attach)
> +		ret = rte_fbarray_attach(&msl->memseg_arr);
> +	else
> +		ret = rte_fbarray_detach(&msl->memseg_arr);
> +
> +unlock:
> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
> +	return ret;
> +}
> +
> +int __rte_experimental
> +rte_extmem_attach(void *va_addr, size_t len)
> +{
> +	return sync_memory(va_addr, len, true);
> +}
> +
> +int __rte_experimental
> +rte_extmem_detach(void *va_addr, size_t len)
> +{
> +	return sync_memory(va_addr, len, false);
> +}
> +
>  /* init memory subsystem */
>  int
>  rte_eal_memory_init(void)
> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
> index 4a43c1a9e..050bb6d8e 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -435,6 +435,10 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>   * @note This API will not perform any DMA mapping. It is expected that user
>   *   will do that themselves.
>   *
> + * @note Before accessing this memory in other processes, it needs to be
> + *   attached in each of those processes by calling ``rte_extmem_attach`` in
> + *   each other process.
> + *
>   * @param va_addr
>   *   Start of virtual area to register
>   * @param len
> @@ -472,6 +476,9 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>   * @note This API will not perform any DMA unmapping. It is expected that user
>   *   will do that themselves.
>   *
> + * @note Before calling this function, all other processes must call
> + *   ``rte_extmem_detach`` to detach from the memory area.
> + *
>   * @param va_addr
>   *   Start of virtual area to unregister
>   * @param len
> @@ -486,6 +493,58 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>  int __rte_experimental
>  rte_extmem_unregister(void *va_addr, size_t len);
>  
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Attach to external memory chunk registered in another process.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA mapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to register
> + * @param len
> + *   Length of virtual area to register
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     ENOENT - memory chunk was not found
> + */
> +int __rte_experimental
> +rte_extmem_attach(void *va_addr, size_t len);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Detach from external memory chunk registered in another process.
> + *
> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
> + *   API's.
> + *
> + * @note This API will not perform any DMA unmapping. It is expected that user
> + *   will do that themselves.
> + *
> + * @param va_addr
> + *   Start of virtual area to unregister
> + * @param len
> + *   Length of virtual area to unregister
> + *
> + * @return
> + *   - 0 on success
> + *   - -1 in case of error, with rte_errno set to one of the following:
> + *     EINVAL - one of the parameters was invalid
> + *     ENOENT - memory chunk was not found
> + */
> +int __rte_experimental
> +rte_extmem_detach(void *va_addr, size_t len);
> +
>  /**
>   * Dump the physical memory layout to a file.
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 593691a14..eb5f7b9cb 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_eal_cleanup;
> +	rte_extmem_attach;
> +	rte_extmem_detach;
>  	rte_extmem_register;
>  	rte_extmem_unregister;
>  	rte_fbarray_attach;
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas
  2018-12-14  9:55   ` Yongseok Koh
@ 2018-12-14 11:03     ` Burakov, Anatoly
  0 siblings, 0 replies; 29+ messages in thread
From: Burakov, Anatoly @ 2018-12-14 11:03 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: dev, John McNamara, Marko Kovacevic, Shahaf Shuler,
	Thomas Monjalon, shreyansh.jain

On 14-Dec-18 9:55 AM, Yongseok Koh wrote:
> On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
>> The general use-case of using external memory is well covered by
>> existing external memory API's. However, certain use cases require
>> manual management of externally allocated memory areas, so this
>> memory should not be added to the heap. It should, however, be
>> added to DPDK's internal structures, so that API's like
>> ``rte_virt2memseg`` would work on such external memory segments.
>>
>> This commit adds such an API to DPDK. The new functions will allow
>> to register and unregister externally allocated memory areas, as
>> well as documentation for them.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>>   lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>>   lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>>   lib/librte_eal/rte_eal_version.map            |  2 +
>>   4 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 8b5d050c7..d7799b626 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>>   Support for Externally Allocated Memory
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> -It is possible to use externally allocated memory in DPDK, using a set of malloc
>> -heap API's. Support for externally allocated memory is implemented through
>> -overloading the socket ID - externally allocated heaps will have socket ID's
>> -that would be considered invalid under normal circumstances. Requesting an
>> -allocation to take place from a specified externally allocated memory is a
>> -matter of supplying the correct socket ID to DPDK allocator, either directly
>> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
>> -structure-specific allocation API's such as ``rte_ring_create``).
>> +It is possible to use externally allocated memory in DPDK. There are two ways in
>> +which using externally allocated memory can work: the malloc heap API's, and
>> +manual memory management.
>>   
>> -Since there is no way DPDK can verify whether memory are is available or valid,
>> -this responsibility falls on the shoulders of the user. All multiprocess
>> ++ Using heap API's for externally allocated memory
>> +
>> +Using using a set of malloc heap API's is the recommended way to use externally
>> +allocated memory in DPDK. In this way, support for externally allocated memory
>> +is implemented through overloading the socket ID - externally allocated heaps
>> +will have socket ID's that would be considered invalid under normal
>> +circumstances. Requesting an allocation to take place from a specified
>> +externally allocated memory is a matter of supplying the correct socket ID to
>> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
>> +indirectly (through data structure-specific allocation API's such as
>> +``rte_ring_create``). Using these API's also ensures that mapping of externally
>> +allocated memory for DMA is also performed on any memory segment that is added
>> +to a DPDK malloc heap.
>> +
>> +Since there is no way DPDK can verify whether memory is available or valid, this
>> +responsibility falls on the shoulders of the user. All multiprocess
>>   synchronization is also user's responsibility, as well as ensuring  that all
>>   calls to add/attach/detach/remove memory are done in the correct order. It is
>>   not required to attach to a memory area in all processes - only attach to memory
>> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>>   For more information, please refer to ``rte_malloc`` API documentation,
>>   specifically the ``rte_malloc_heap_*`` family of function calls.
>>   
>> ++ Using externally allocated memory without DPDK API's
>> +
>> +While using heap API's is the recommended method of using externally allocated
>> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
>> +is undesirable - for example, when manual memory management is performed on an
>> +externally allocated area. To support use cases where externally allocated
>> +memory will not be used as part of normal DPDK workflow, there is also another
>> +set of API's under the ``rte_extmem_*`` namespace.
>> +
>> +These API's are (as their name implies) intended to allow registering or
>> +unregistering externally allocated memory to/from DPDK's internal page table, to
>> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
>> +memory. Memory added this way will not be available for any regular DPDK
>> +allocators; DPDK will leave this memory for the user application to manage.
>> +
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Register memory within DPDK
>> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
>> +      unavailable
>> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>> +* Use the memory area in your application
>> +* If memory area is no longer needed, it can be unregistered
>> +    - If the area was mapped for DMA, unmapping must be performed before
>> +      unregistering memory
>> +
>> +Since these externally allocated memory areas will not be managed by DPDK, it is
>> +therefore up to the user application to decide how to use them and what to do
>> +with them once they're registered.
>> +
>>   Per-lcore and Shared Variables
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
>> index d47ea4938..a2e085ae8 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,7 @@
>>   #include "eal_memalloc.h"
>>   #include "eal_private.h"
>>   #include "eal_internal_cfg.h"
>> +#include "malloc_heap.h"
>>   
>>   /*
>>    * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>>   	return ret;
>>   }
>>   
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	unsigned int socket_id;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
>> +			!rte_is_power_of_2(page_sz) ||
>> +			RTE_ALIGN(len, page_sz) != len) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
> 
> Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
> rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
> it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
> have it either... Also you might want to add it to documentation that
> granularity of these registrations is a page.
> 

Hi Yongseok,

Thanks for your review.

n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct 
in that more sanity checking and documentation re: page alignment would 
be beneficial. I'll submit a v2.


> Otherwise,
> 
> Acked-by: Yongseok Koh <yskoh@mellanox.com>
> Thanks
> 
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* make sure the segment doesn't already exist */
>> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
>> +		rte_errno = EEXIST;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* get next available socket ID */
>> +	socket_id = mcfg->next_socket_id;
>> +	if (socket_id > INT32_MAX) {
>> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
>> +		rte_errno = ENOSPC;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* we can create a new memseg */
>> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
>> +			page_sz, "extmem", socket_id) == NULL) {
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* memseg list successfully created - increment next socket ID */
>> +	mcfg->next_socket_id++;
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	struct rte_memseg_list *msl;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || len == 0) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* find our segment */
>> +	msl = malloc_heap_find_external_seg(va_addr, len);
>> +	if (msl == NULL) {
>> +		rte_errno = ENOENT;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	ret = malloc_heap_destroy_external_seg(msl);
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>>   /* init memory subsystem */
>>   int
>>   rte_eal_memory_init(void)
>> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
>> index d970825df..4a43c1a9e 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -423,6 +423,69 @@ int __rte_experimental
>>   rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>>   		size_t *offset);
>>   
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA mapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to register
>> + * @param len
>> + *   Length of virtual area to register
>> + * @param iova_addrs
>> + *   Array of page IOVA addresses corresponding to each page in this memory
>> + *   area. Can be NULL, in which case page IOVA addresses will be set to
>> + *   RTE_BAD_IOVA.
>> + * @param n_pages
>> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
>> + *   is NULL.
>> + * @param page_sz
>> + *   Page size of the underlying memory
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     EEXIST - memory chunk is already registered
>> + *     ENOSPC - no more space in internal config to store a new memory chunk
>> + */
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA unmapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to unregister
>> + * @param len
>> + *   Length of virtual area to unregister
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     ENOENT - memory chunk was not found
>> + */
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len);
>> +
>>   /**
>>    * Dump the physical memory layout to a file.
>>    *
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 3fe78260d..593691a14 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>>   	rte_devargs_remove;
>>   	rte_devargs_type_count;
>>   	rte_eal_cleanup;
>> +	rte_extmem_register;
>> +	rte_extmem_unregister;
>>   	rte_fbarray_attach;
>>   	rte_fbarray_destroy;
>>   	rte_fbarray_detach;
>> -- 
>> 2.17.1
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2 0/4] Allow using external memory without malloc
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-12-02  5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
@ 2018-12-14 11:50 ` Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
                     ` (4 more replies)
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
                   ` (3 subsequent siblings)
  9 siblings, 5 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-14 11:50 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, the only way to use externally allocated memory
is through rte_malloc API's. While this is fine for a lot
of use cases, it may not be suitable for certain other use
cases like manual memory management, etc.

This patchset adds another API to register memory segments
with DPDK (so that API's like ``rte_mem_virt2memseg`` could
be relied on by PMD's and such), but not create a malloc
heap out of them.

Aside from the obvious (not adding memory to a heap), the
other major difference between this API and the
``rte_malloc_heap_*`` external memory functions is the fact
that no DMA mapping is performed automatically, as well as
no mem event callbacks are triggered.

This really draws a line in the sand, and there are now two
ways of doing things - do everything automatically (using
the ``rte_malloc_heap_*`` API's), or do everything manually
(``rte_extmem_*`` and future DMA mapping API [1] that would
replace ``rte_vfio_dma_map``). This way, the consistency of
API is kept, and flexibility is also allowed.

[1] https://mails.dpdk.org/archives/dev/2018-November/118175.html

Anatoly Burakov (4):
  malloc: separate creating memseg list and malloc heap
  malloc: separate destroying memseg list and heap data
  mem: allow registering external memory areas
  mem: allow usage of non-heap external memory in multiprocess

 .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
 doc/guides/rel_notes/release_19_02.rst        |   6 +
 lib/librte_eal/common/eal_common_memory.c     | 119 +++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 122 ++++++++++++++++++
 lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
 lib/librte_eal/common/malloc_heap.h           |  15 ++-
 lib/librte_eal/common/rte_malloc.c            | 115 +++++++----------
 lib/librte_eal/rte_eal_version.map            |   4 +
 8 files changed, 443 insertions(+), 105 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (5 preceding siblings ...)
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
@ 2018-12-14 11:50 ` Anatoly Burakov
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-14 11:50 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 lib/librte_eal/common/malloc_heap.c | 34 ++++++++++++++++++-----------
 lib/librte_eal/common/malloc_heap.h |  9 ++++++--
 lib/librte_eal/common/rte_malloc.c  | 11 ++++++++--
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c6a6d4f6b..25693481f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1095,9 +1095,10 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 	return 0;
 }
 
-int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	char fbarray_name[RTE_FBARRAY_NAME_LEN];
@@ -1117,17 +1118,17 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	if (msl == NULL) {
 		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
 		rte_errno = ENOSPC;
-		return -1;
+		return NULL;
 	}
 
 	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
-			heap->name, va_addr);
+			seg_name, va_addr);
 
 	/* create the backing fbarray */
 	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
 			sizeof(struct rte_memseg)) < 0) {
 		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
-		return -1;
+		return NULL;
 	}
 	arr = &msl->memseg_arr;
 
@@ -1143,32 +1144,39 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		ms->len = page_sz;
 		ms->nchannel = rte_memory_get_nchannel();
 		ms->nrank = rte_memory_get_nrank();
-		ms->socket_id = heap->socket_id;
+		ms->socket_id = socket_id;
 	}
 
 	/* set up the memseg list */
 	msl->base_va = va_addr;
 	msl->page_sz = page_sz;
-	msl->socket_id = heap->socket_id;
+	msl->socket_id = socket_id;
 	msl->len = seg_len;
 	msl->version = 0;
 	msl->external = 1;
 
+	return msl;
+}
+
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl)
+{
 	/* erase contents of new memory */
-	memset(va_addr, 0, seg_len);
+	memset(msl->base_va, 0, msl->len);
 
 	/* now, add newly minted memory to the malloc heap */
-	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+	malloc_heap_add_memory(heap, msl, msl->base_va, msl->len);
 
-	heap->total_size += seg_len;
+	heap->total_size += msl->len;
 
 	/* all done! */
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
-			heap->name, va_addr);
+			heap->name, msl->base_va);
 
 	/* notify all subscribers that a new memory area has been added */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
-			va_addr, seg_len);
+			msl->base_va, msl->len);
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index e48996d52..255a315b8 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,9 +39,14 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id);
+
 int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl);
 
 int
 malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 0da5ad5e8..66bfe63c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -340,6 +340,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	unsigned int n;
 	int ret;
 
@@ -373,9 +374,15 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 		goto unlock;
 	}
 
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n, page_sz,
+			heap_name, heap->socket_id);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
-	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
-			page_sz);
+	ret = malloc_heap_add_external_memory(heap, msl);
 	rte_spinlock_unlock(&heap->lock);
 
 unlock:
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (6 preceding siblings ...)
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
@ 2018-12-14 11:50 ` Anatoly Burakov
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas Anatoly Burakov
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
  9 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-14 11:50 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 lib/librte_eal/common/malloc_heap.c |  70 +++++++++++++++----
 lib/librte_eal/common/malloc_heap.h |   6 ++
 lib/librte_eal/common/rte_malloc.c  | 104 ++++++++++------------------
 3 files changed, 102 insertions(+), 78 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 25693481f..fa0cb0799 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1067,12 +1067,9 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 }
 
 static int
-destroy_seg(struct malloc_elem *elem, size_t len)
+destroy_elem(struct malloc_elem *elem, size_t len)
 {
 	struct malloc_heap *heap = elem->heap;
-	struct rte_memseg_list *msl;
-
-	msl = elem->msl;
 
 	/* notify all subscribers that a memory area is going to be removed */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
@@ -1085,13 +1082,6 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	memset(elem, 0, sizeof(*elem));
 
-	/* destroy the fbarray backing this memory */
-	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
-		return -1;
-
-	/* reset the memseg list */
-	memset(msl, 0, sizeof(*msl));
-
 	return 0;
 }
 
@@ -1158,6 +1148,62 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	return msl;
 }
 
+struct extseg_walk_arg {
+	void *va_addr;
+	size_t len;
+	struct rte_memseg_list *msl;
+};
+
+static int
+extseg_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct extseg_walk_arg *wa = arg;
+
+	if (msl->base_va == wa->va_addr && msl->len == wa->len) {
+		unsigned int found_idx;
+
+		/* msl is const */
+		found_idx = msl - mcfg->memsegs;
+		wa->msl = &mcfg->memsegs[found_idx];
+		return 1;
+	}
+	return 0;
+}
+
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len)
+{
+	struct extseg_walk_arg wa;
+	int res;
+
+	wa.va_addr = va_addr;
+	wa.len = len;
+
+	res = rte_memseg_list_walk_thread_unsafe(extseg_walk, &wa);
+
+	if (res != 1) {
+		/* 0 means nothing was found, -1 shouldn't happen */
+		if (res == 0)
+			rte_errno = ENOENT;
+		return NULL;
+	}
+	return wa.msl;
+}
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl)
+{
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl)
@@ -1206,7 +1252,7 @@ malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_errno = EBUSY;
 		return -1;
 	}
-	return destroy_seg(elem, len);
+	return destroy_elem(elem, len);
 }
 
 int
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 255a315b8..ca9ff666f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -44,6 +44,12 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len);
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl);
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 66bfe63c3..9a82e3386 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -396,6 +396,7 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -420,9 +421,19 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 		goto unlock;
 	}
 
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
 	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
 	rte_spinlock_unlock(&heap->lock);
+	if (ret != 0)
+		goto unlock;
+
+	ret = malloc_heap_destroy_external_seg(msl);
 
 unlock:
 	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
@@ -430,63 +441,12 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
-struct sync_mem_walk_arg {
-	void *va_addr;
-	size_t len;
-	int result;
-	bool attach;
-};
-
-static int
-sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
-{
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct sync_mem_walk_arg *wa = arg;
-	size_t len = msl->page_sz * msl->memseg_arr.len;
-
-	if (msl->base_va == wa->va_addr &&
-			len == wa->len) {
-		struct rte_memseg_list *found_msl;
-		int msl_idx, ret;
-
-		/* msl is const */
-		msl_idx = msl - mcfg->memsegs;
-		found_msl = &mcfg->memsegs[msl_idx];
-
-		if (wa->attach) {
-			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		} else {
-			/* notify all subscribers that a memory area is about to
-			 * be removed
-			 */
-			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
-					msl->base_va, msl->len);
-			ret = rte_fbarray_detach(&found_msl->memseg_arr);
-		}
-
-		if (ret < 0) {
-			wa->result = -rte_errno;
-		} else {
-			/* notify all subscribers that a new memory area was
-			 * added
-			 */
-			if (wa->attach)
-				eal_memalloc_mem_event_notify(
-						RTE_MEM_EVENT_ALLOC,
-						msl->base_va, msl->len);
-			wa->result = 0;
-		}
-		return 1;
-	}
-	return 0;
-}
-
 static int
 sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
-	struct sync_mem_walk_arg wa;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -513,23 +473,35 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 	}
 
 	/* find corresponding memseg list to sync to */
-	wa.va_addr = va_addr;
-	wa.len = len;
-	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
-	wa.attach = attach;
-
-	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
-
-	if (wa.result < 0) {
-		rte_errno = -wa.result;
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
 		ret = -1;
-	} else {
-		/* notify all subscribers that a new memory area was added */
-		if (attach)
+		goto unlock;
+	}
+
+	if (attach) {
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+		if (ret == 0) {
+			/* notify all subscribers that a new memory area was
+			 * added.
+			 */
 			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
 					va_addr, len);
-		ret = 0;
+		} else {
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		/* notify all subscribers that a memory area is about to
+		 * be removed.
+		 */
+		eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+				msl->base_va, msl->len);
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+		if (ret < 0) {
+			ret = -1;
+			goto unlock;
+		}
 	}
 unlock:
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (7 preceding siblings ...)
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
@ 2018-12-14 11:50 ` Anatoly Burakov
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
  9 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-14 11:50 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---

Notes:
    v2:
    - Do more stringent alignment checks
    - Fix a bug where n_pages was used as is without
      parameter checking

 .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
 doc/guides/rel_notes/release_19_02.rst        |  6 ++
 lib/librte_eal/common/eal_common_memory.c     | 77 +++++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 63 +++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 5 files changed, 198 insertions(+), 10 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 8b5d050c7..d7799b626 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-It is possible to use externally allocated memory in DPDK, using a set of malloc
-heap API's. Support for externally allocated memory is implemented through
-overloading the socket ID - externally allocated heaps will have socket ID's
-that would be considered invalid under normal circumstances. Requesting an
-allocation to take place from a specified externally allocated memory is a
-matter of supplying the correct socket ID to DPDK allocator, either directly
-(e.g. through a call to ``rte_malloc``) or indirectly (through data
-structure-specific allocation API's such as ``rte_ring_create``).
+It is possible to use externally allocated memory in DPDK. There are two ways in
+which using externally allocated memory can work: the malloc heap API's, and
+manual memory management.
 
-Since there is no way DPDK can verify whether memory are is available or valid,
-this responsibility falls on the shoulders of the user. All multiprocess
++ Using heap API's for externally allocated memory
+
+Using using a set of malloc heap API's is the recommended way to use externally
+allocated memory in DPDK. In this way, support for externally allocated memory
+is implemented through overloading the socket ID - externally allocated heaps
+will have socket ID's that would be considered invalid under normal
+circumstances. Requesting an allocation to take place from a specified
+externally allocated memory is a matter of supplying the correct socket ID to
+DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
+indirectly (through data structure-specific allocation API's such as
+``rte_ring_create``). Using these API's also ensures that mapping of externally
+allocated memory for DMA is also performed on any memory segment that is added
+to a DPDK malloc heap.
+
+Since there is no way DPDK can verify whether memory is available or valid, this
+responsibility falls on the shoulders of the user. All multiprocess
 synchronization is also user's responsibility, as well as ensuring  that all
 calls to add/attach/detach/remove memory are done in the correct order. It is
 not required to attach to a memory area in all processes - only attach to memory
@@ -246,6 +255,37 @@ The expected workflow is as follows:
 For more information, please refer to ``rte_malloc`` API documentation,
 specifically the ``rte_malloc_heap_*`` family of function calls.
 
++ Using externally allocated memory without DPDK API's
+
+While using heap API's is the recommended method of using externally allocated
+memory in DPDK, there are certain use cases where the overhead of DPDK heap API
+is undesirable - for example, when manual memory management is performed on an
+externally allocated area. To support use cases where externally allocated
+memory will not be used as part of normal DPDK workflow, there is also another
+set of API's under the ``rte_extmem_*`` namespace.
+
+These API's are (as their name implies) intended to allow registering or
+unregistering externally allocated memory to/from DPDK's internal page table, to
+allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
+memory. Memory added this way will not be available for any regular DPDK
+allocators; DPDK will leave this memory for the user application to manage.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Register memory within DPDK
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable
+* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Use the memory area in your application
+* If memory area is no longer needed, it can be unregistered
+    - If the area was mapped for DMA, unmapping must be performed before
+      unregistering memory
+
+Since these externally allocated memory areas will not be managed by DPDK, it is
+therefore up to the user application to decide how to use them and what to do
+with them once they're registered.
+
 Per-lcore and Shared Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index e86ef9511..0b79918a9 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -54,6 +54,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added API to register external memory in DPDK.**
+
+  A new ``rte_extmem_register``/``rte_extmem_unregister`` API was added to allow
+  chunks of external memory to be registered with DPDK without adding them to
+  the malloc heap.
+
 * **Updated the enic driver.**
 
   * Added support for ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..ea43c1362 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -24,6 +24,7 @@
 #include "eal_memalloc.h"
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
+#include "malloc_heap.h"
 
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
@@ -775,6 +776,82 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int socket_id, n;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len ||
+			((len / page_sz) != n_pages && iova_addrs != NULL) ||
+			!rte_is_aligned(va_addr, page_sz)) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg */
+	n = len / page_sz;
+	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+			page_sz, "extmem", socket_id) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+
+	ret = malloc_heap_destroy_external_seg(msl);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index d970825df..ff23fc2c1 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -423,6 +423,69 @@ int __rte_experimental
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register. Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 3fe78260d..593691a14 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_register;
+	rte_extmem_unregister;
 	rte_fbarray_attach;
 	rte_fbarray_destroy;
 	rte_fbarray_detach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess
  2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
                   ` (8 preceding siblings ...)
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas Anatoly Burakov
@ 2018-12-14 11:50 ` Anatoly Burakov
  9 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-14 11:50 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 .../prog_guide/env_abstraction_layer.rst      |  3 +
 lib/librte_eal/common/eal_common_memory.c     | 42 +++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 59 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 4 files changed, 106 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index d7799b626..b0491bf2d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -276,11 +276,14 @@ The expected workflow is as follows:
 * Register memory within DPDK
     - If IOVA table is not specified, IOVA addresses will be assumed to be
       unavailable
+    - Other processes must attach to the memory area before they can use it
 * Perform DMA mapping with ``rte_vfio_dma_map`` if needed
 * Use the memory area in your application
 * If memory area is no longer needed, it can be unregistered
     - If the area was mapped for DMA, unmapping must be performed before
       unregistering memory
+    - Other processes must detach from the memory area before it can be
+      unregistered
 
 Since these externally allocated memory areas will not be managed by DPDK, it is
 therefore up to the user application to decide how to use them and what to do
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index ea43c1362..051159f80 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -852,6 +852,48 @@ rte_extmem_unregister(void *va_addr, size_t len)
 	return ret;
 }
 
+static int
+sync_memory(void *va_addr, size_t len, bool attach)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (attach)
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+	else
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, true);
+}
+
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, false);
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index ff23fc2c1..7ca703bb1 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -435,6 +435,10 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
  * @note This API will not perform any DMA mapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
  * @param va_addr
  *   Start of virtual area to register. Must be aligned by ``page_sz``.
  * @param len
@@ -472,6 +476,9 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
  * @note This API will not perform any DMA unmapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before calling this function, all other processes must call
+ *   ``rte_extmem_detach`` to detach from the memory area.
+ *
  * @param va_addr
  *   Start of virtual area to unregister
  * @param len
@@ -486,6 +493,58 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 int __rte_experimental
 rte_extmem_unregister(void *va_addr, size_t len);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Attach to external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register
+ * @param len
+ *   Length of virtual area to register
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Detach from external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 593691a14..eb5f7b9cb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_attach;
+	rte_extmem_detach;
 	rte_extmem_register;
 	rte_extmem_unregister;
 	rte_fbarray_attach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] Allow using external memory without malloc
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
@ 2018-12-20 15:32   ` Anatoly Burakov
  2018-12-20 16:16     ` Stephen Hemminger
  2018-12-20 17:17     ` Thomas Monjalon
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-20 15:32 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, the only way to use externally allocated memory
is through rte_malloc API's. While this is fine for a lot
of use cases, it may not be suitable for certain other use
cases like manual memory management, etc.

This patchset adds another API to register memory segments
with DPDK (so that API's like ``rte_mem_virt2memseg`` could
be relied on by PMD's and such), but not create a malloc
heap out of them.

Aside from the obvious (not adding memory to a heap), the
other major difference between this API and the
``rte_malloc_heap_*`` external memory functions is the fact
that no DMA mapping is performed automatically, as well as
no mem event callbacks are triggered.

This really draws a line in the sand, and there are now two
ways of doing things - do everything automatically (using
the ``rte_malloc_heap_*`` API's), or do everything manually
(``rte_extmem_*`` and future DMA mapping API [1] that would
replace ``rte_vfio_dma_map``). This way, the consistency of
API is kept, and flexibility is also allowed.

[1] https://mails.dpdk.org/archives/dev/2018-November/118175.html

v3:
- Rebase on latest master

v2:
- More sanity checking of parameters

Anatoly Burakov (4):
  malloc: separate creating memseg list and malloc heap
  malloc: separate destroying memseg list and heap data
  mem: allow registering external memory areas
  mem: allow usage of non-heap external memory in multiprocess

 .../prog_guide/env_abstraction_layer.rst      |  63 +++++++--
 doc/guides/rel_notes/release_19_02.rst        |   7 +
 lib/librte_eal/common/eal_common_memory.c     | 119 +++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 122 ++++++++++++++++++
 lib/librte_eal/common/malloc_heap.c           | 104 +++++++++++----
 lib/librte_eal/common/malloc_heap.h           |  15 ++-
 lib/librte_eal/common/rte_malloc.c            | 116 +++++++----------
 lib/librte_eal/rte_eal_version.map            |   4 +
 8 files changed, 447 insertions(+), 103 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-12-20 15:32   ` Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-20 15:32 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, creating external malloc heap involves also creating
a memseg list backing that malloc heap. We need to have them as
separate functions, to allow creating memseg lists without
creating a malloc heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 lib/librte_eal/common/malloc_heap.c | 34 ++++++++++++++++++-----------
 lib/librte_eal/common/malloc_heap.h |  9 ++++++--
 lib/librte_eal/common/rte_malloc.c  | 11 ++++++++--
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 4c3632d02..e243a8f57 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1102,9 +1102,10 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 	return 0;
 }
 
-int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	char fbarray_name[RTE_FBARRAY_NAME_LEN];
@@ -1124,17 +1125,17 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 	if (msl == NULL) {
 		RTE_LOG(ERR, EAL, "Couldn't find empty memseg list\n");
 		rte_errno = ENOSPC;
-		return -1;
+		return NULL;
 	}
 
 	snprintf(fbarray_name, sizeof(fbarray_name) - 1, "%s_%p",
-			heap->name, va_addr);
+			seg_name, va_addr);
 
 	/* create the backing fbarray */
 	if (rte_fbarray_init(&msl->memseg_arr, fbarray_name, n_pages,
 			sizeof(struct rte_memseg)) < 0) {
 		RTE_LOG(ERR, EAL, "Couldn't create fbarray backing the memseg list\n");
-		return -1;
+		return NULL;
 	}
 	arr = &msl->memseg_arr;
 
@@ -1150,32 +1151,39 @@ malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
 		ms->len = page_sz;
 		ms->nchannel = rte_memory_get_nchannel();
 		ms->nrank = rte_memory_get_nrank();
-		ms->socket_id = heap->socket_id;
+		ms->socket_id = socket_id;
 	}
 
 	/* set up the memseg list */
 	msl->base_va = va_addr;
 	msl->page_sz = page_sz;
-	msl->socket_id = heap->socket_id;
+	msl->socket_id = socket_id;
 	msl->len = seg_len;
 	msl->version = 0;
 	msl->external = 1;
 
+	return msl;
+}
+
+int
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl)
+{
 	/* erase contents of new memory */
-	memset(va_addr, 0, seg_len);
+	memset(msl->base_va, 0, msl->len);
 
 	/* now, add newly minted memory to the malloc heap */
-	malloc_heap_add_memory(heap, msl, va_addr, seg_len);
+	malloc_heap_add_memory(heap, msl, msl->base_va, msl->len);
 
-	heap->total_size += seg_len;
+	heap->total_size += msl->len;
 
 	/* all done! */
 	RTE_LOG(DEBUG, EAL, "Added segment for heap %s starting at %p\n",
-			heap->name, va_addr);
+			heap->name, msl->base_va);
 
 	/* notify all subscribers that a new memory area has been added */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
-			va_addr, seg_len);
+			msl->base_va, msl->len);
 
 	return 0;
 }
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index e48996d52..255a315b8 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -39,9 +39,14 @@ malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
 int
 malloc_heap_destroy(struct malloc_heap *heap);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id);
+
 int
-malloc_heap_add_external_memory(struct malloc_heap *heap, void *va_addr,
-		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+malloc_heap_add_external_memory(struct malloc_heap *heap,
+		struct rte_memseg_list *msl);
 
 int
 malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 06cf1e666..8a1747785 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -340,6 +340,7 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	unsigned int n;
 	int ret;
 
@@ -371,9 +372,15 @@ rte_malloc_heap_memory_add(const char *heap_name, void *va_addr, size_t len,
 	}
 	n = len / page_sz;
 
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n, page_sz,
+			heap_name, heap->socket_id);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
-	ret = malloc_heap_add_external_memory(heap, va_addr, iova_addrs, n,
-			page_sz);
+	ret = malloc_heap_add_external_memory(heap, msl);
 	rte_spinlock_unlock(&heap->lock);
 
 unlock:
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
@ 2018-12-20 15:32   ` Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
  4 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-20 15:32 UTC (permalink / raw)
  To: dev; +Cc: shahafs, yskoh, thomas, shreyansh.jain

Currently, destroying external heap chunk and its memseg list is
part of one process. When we will gain the ability to unregister
external memory from DPDK that doesn't have any heap structures
associated with it, we need to be able to find and destroy
memseg lists as well as heap data separately.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 lib/librte_eal/common/malloc_heap.c |  70 +++++++++++++++----
 lib/librte_eal/common/malloc_heap.h |   6 ++
 lib/librte_eal/common/rte_malloc.c  | 105 +++++++++++-----------------
 3 files changed, 105 insertions(+), 76 deletions(-)

diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index e243a8f57..c5d254d8a 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -1074,12 +1074,9 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 }
 
 static int
-destroy_seg(struct malloc_elem *elem, size_t len)
+destroy_elem(struct malloc_elem *elem, size_t len)
 {
 	struct malloc_heap *heap = elem->heap;
-	struct rte_memseg_list *msl;
-
-	msl = elem->msl;
 
 	/* notify all subscribers that a memory area is going to be removed */
 	eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE, elem, len);
@@ -1092,13 +1089,6 @@ destroy_seg(struct malloc_elem *elem, size_t len)
 
 	memset(elem, 0, sizeof(*elem));
 
-	/* destroy the fbarray backing this memory */
-	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
-		return -1;
-
-	/* reset the memseg list */
-	memset(msl, 0, sizeof(*msl));
-
 	return 0;
 }
 
@@ -1165,6 +1155,62 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	return msl;
 }
 
+struct extseg_walk_arg {
+	void *va_addr;
+	size_t len;
+	struct rte_memseg_list *msl;
+};
+
+static int
+extseg_walk(const struct rte_memseg_list *msl, void *arg)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct extseg_walk_arg *wa = arg;
+
+	if (msl->base_va == wa->va_addr && msl->len == wa->len) {
+		unsigned int found_idx;
+
+		/* msl is const */
+		found_idx = msl - mcfg->memsegs;
+		wa->msl = &mcfg->memsegs[found_idx];
+		return 1;
+	}
+	return 0;
+}
+
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len)
+{
+	struct extseg_walk_arg wa;
+	int res;
+
+	wa.va_addr = va_addr;
+	wa.len = len;
+
+	res = rte_memseg_list_walk_thread_unsafe(extseg_walk, &wa);
+
+	if (res != 1) {
+		/* 0 means nothing was found, -1 shouldn't happen */
+		if (res == 0)
+			rte_errno = ENOENT;
+		return NULL;
+	}
+	return wa.msl;
+}
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl)
+{
+	/* destroy the fbarray backing this memory */
+	if (rte_fbarray_destroy(&msl->memseg_arr) < 0)
+		return -1;
+
+	/* reset the memseg list */
+	memset(msl, 0, sizeof(*msl));
+
+	return 0;
+}
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl)
@@ -1213,7 +1259,7 @@ malloc_heap_remove_external_memory(struct malloc_heap *heap, void *va_addr,
 		rte_errno = EBUSY;
 		return -1;
 	}
-	return destroy_seg(elem, len);
+	return destroy_elem(elem, len);
 }
 
 int
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 255a315b8..ca9ff666f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -44,6 +44,12 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_find_external_seg(void *va_addr, size_t len);
+
+int
+malloc_heap_destroy_external_seg(struct rte_memseg_list *msl);
+
 int
 malloc_heap_add_external_memory(struct malloc_heap *heap,
 		struct rte_memseg_list *msl);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 8a1747785..09051c236 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -394,6 +394,7 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -418,9 +419,19 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 		goto unlock;
 	}
 
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
 	rte_spinlock_lock(&heap->lock);
 	ret = malloc_heap_remove_external_memory(heap, va_addr, len);
 	rte_spinlock_unlock(&heap->lock);
+	if (ret != 0)
+		goto unlock;
+
+	ret = malloc_heap_destroy_external_seg(msl);
 
 unlock:
 	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
@@ -428,63 +439,12 @@ rte_malloc_heap_memory_remove(const char *heap_name, void *va_addr, size_t len)
 	return ret;
 }
 
-struct sync_mem_walk_arg {
-	void *va_addr;
-	size_t len;
-	int result;
-	bool attach;
-};
-
-static int
-sync_mem_walk(const struct rte_memseg_list *msl, void *arg)
-{
-	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
-	struct sync_mem_walk_arg *wa = arg;
-	size_t len = msl->page_sz * msl->memseg_arr.len;
-
-	if (msl->base_va == wa->va_addr &&
-			len == wa->len) {
-		struct rte_memseg_list *found_msl;
-		int msl_idx, ret;
-
-		/* msl is const */
-		msl_idx = msl - mcfg->memsegs;
-		found_msl = &mcfg->memsegs[msl_idx];
-
-		if (wa->attach) {
-			ret = rte_fbarray_attach(&found_msl->memseg_arr);
-		} else {
-			/* notify all subscribers that a memory area is about to
-			 * be removed
-			 */
-			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
-					msl->base_va, msl->len);
-			ret = rte_fbarray_detach(&found_msl->memseg_arr);
-		}
-
-		if (ret < 0) {
-			wa->result = -rte_errno;
-		} else {
-			/* notify all subscribers that a new memory area was
-			 * added
-			 */
-			if (wa->attach)
-				eal_memalloc_mem_event_notify(
-						RTE_MEM_EVENT_ALLOC,
-						msl->base_va, msl->len);
-			wa->result = 0;
-		}
-		return 1;
-	}
-	return 0;
-}
-
 static int
 sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	struct malloc_heap *heap = NULL;
-	struct sync_mem_walk_arg wa;
+	struct rte_memseg_list *msl;
 	int ret;
 
 	if (heap_name == NULL || va_addr == NULL || len == 0 ||
@@ -511,19 +471,36 @@ sync_memory(const char *heap_name, void *va_addr, size_t len, bool attach)
 	}
 
 	/* find corresponding memseg list to sync to */
-	wa.va_addr = va_addr;
-	wa.len = len;
-	wa.result = -ENOENT; /* fail unless explicitly told to succeed */
-	wa.attach = attach;
-
-	/* we're already holding a read lock */
-	rte_memseg_list_walk_thread_unsafe(sync_mem_walk, &wa);
-
-	if (wa.result < 0) {
-		rte_errno = -wa.result;
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
 		ret = -1;
-	} else
-		ret = 0;
+		goto unlock;
+	}
+
+	if (attach) {
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+		if (ret == 0) {
+			/* notify all subscribers that a new memory area was
+			 * added.
+			 */
+			eal_memalloc_mem_event_notify(RTE_MEM_EVENT_ALLOC,
+					va_addr, len);
+		} else {
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		/* notify all subscribers that a memory area is about to
+		 * be removed.
+		 */
+		eal_memalloc_mem_event_notify(RTE_MEM_EVENT_FREE,
+				msl->base_va, msl->len);
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+		if (ret < 0) {
+			ret = -1;
+			goto unlock;
+		}
+	}
 unlock:
 	rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
 	return ret;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
                     ` (2 preceding siblings ...)
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
@ 2018-12-20 15:32   ` Anatoly Burakov
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
  4 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-20 15:32 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

The general use-case of using external memory is well covered by
existing external memory API's. However, certain use cases require
manual management of externally allocated memory areas, so this
memory should not be added to the heap. It should, however, be
added to DPDK's internal structures, so that API's like
``rte_virt2memseg`` would work on such external memory segments.

This commit adds such an API to DPDK. The new functions will allow
to register and unregister externally allocated memory areas, as
well as documentation for them.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---

Notes:
    v2:
    - Do more stringent alignment checks
    - Fix a bug where n_pages was used as is without
      parameter checking

 .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
 doc/guides/rel_notes/release_19_02.rst        |  7 ++
 lib/librte_eal/common/eal_common_memory.c     | 77 +++++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 63 +++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 5 files changed, 199 insertions(+), 10 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 19b470e27..190662e80 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -226,17 +226,26 @@ Normally, these options do not need to be changed.
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-It is possible to use externally allocated memory in DPDK, using a set of malloc
-heap API's. Support for externally allocated memory is implemented through
-overloading the socket ID - externally allocated heaps will have socket ID's
-that would be considered invalid under normal circumstances. Requesting an
-allocation to take place from a specified externally allocated memory is a
-matter of supplying the correct socket ID to DPDK allocator, either directly
-(e.g. through a call to ``rte_malloc``) or indirectly (through data
-structure-specific allocation API's such as ``rte_ring_create``).
+It is possible to use externally allocated memory in DPDK. There are two ways in
+which using externally allocated memory can work: the malloc heap API's, and
+manual memory management.
 
-Since there is no way DPDK can verify whether memory are is available or valid,
-this responsibility falls on the shoulders of the user. All multiprocess
++ Using heap API's for externally allocated memory
+
+Using using a set of malloc heap API's is the recommended way to use externally
+allocated memory in DPDK. In this way, support for externally allocated memory
+is implemented through overloading the socket ID - externally allocated heaps
+will have socket ID's that would be considered invalid under normal
+circumstances. Requesting an allocation to take place from a specified
+externally allocated memory is a matter of supplying the correct socket ID to
+DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
+indirectly (through data structure-specific allocation API's such as
+``rte_ring_create``). Using these API's also ensures that mapping of externally
+allocated memory for DMA is also performed on any memory segment that is added
+to a DPDK malloc heap.
+
+Since there is no way DPDK can verify whether memory is available or valid, this
+responsibility falls on the shoulders of the user. All multiprocess
 synchronization is also user's responsibility, as well as ensuring  that all
 calls to add/attach/detach/remove memory are done in the correct order. It is
 not required to attach to a memory area in all processes - only attach to memory
@@ -260,6 +269,37 @@ The expected workflow is as follows:
 For more information, please refer to ``rte_malloc`` API documentation,
 specifically the ``rte_malloc_heap_*`` family of function calls.
 
++ Using externally allocated memory without DPDK API's
+
+While using heap API's is the recommended method of using externally allocated
+memory in DPDK, there are certain use cases where the overhead of DPDK heap API
+is undesirable - for example, when manual memory management is performed on an
+externally allocated area. To support use cases where externally allocated
+memory will not be used as part of normal DPDK workflow, there is also another
+set of API's under the ``rte_extmem_*`` namespace.
+
+These API's are (as their name implies) intended to allow registering or
+unregistering externally allocated memory to/from DPDK's internal page table, to
+allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
+memory. Memory added this way will not be available for any regular DPDK
+allocators; DPDK will leave this memory for the user application to manage.
+
+The expected workflow is as follows:
+
+* Get a pointer to memory area
+* Register memory within DPDK
+    - If IOVA table is not specified, IOVA addresses will be assumed to be
+      unavailable
+* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
+* Use the memory area in your application
+* If memory area is no longer needed, it can be unregistered
+    - If the area was mapped for DMA, unmapping must be performed before
+      unregistering memory
+
+Since these externally allocated memory areas will not be managed by DPDK, it is
+therefore up to the user application to decide how to use them and what to do
+with them once they're registered.
+
 Per-lcore and Shared Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index aee93953e..148be2ba3 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -54,6 +54,7 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+
 * **Added support to free hugepages exactly as originally allocated.**
 
   Some applications using memory event callbacks (especially for managing
@@ -63,6 +64,12 @@ New Features
   hugepage allocations.  A new ``--match-allocations`` EAL init flag has
   been added to fulfill both of these requirements.
 
+* **Added API to register external memory in DPDK.**
+
+  A new ``rte_extmem_register``/``rte_extmem_unregister`` API was added to allow
+  chunks of external memory to be registered with DPDK without adding them to
+  the malloc heap.
+
 * **Updated the enic driver.**
 
   * Added support for ``RTE_ETH_DEV_CLOSE_REMOVE`` flag.
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..ea43c1362 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -24,6 +24,7 @@
 #include "eal_memalloc.h"
 #include "eal_private.h"
 #include "eal_internal_cfg.h"
+#include "malloc_heap.h"
 
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
@@ -775,6 +776,82 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int socket_id, n;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len ||
+			((len / page_sz) != n_pages && iova_addrs != NULL) ||
+			!rte_is_aligned(va_addr, page_sz)) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg */
+	n = len / page_sz;
+	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+			page_sz, "extmem", socket_id) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+
+	ret = malloc_heap_destroy_external_seg(msl);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index d970825df..ff23fc2c1 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -423,6 +423,69 @@ int __rte_experimental
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Register external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register. Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int __rte_experimental
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Unregister external memory chunk with DPDK.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_unregister(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 3fe78260d..593691a14 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_register;
+	rte_extmem_unregister;
 	rte_fbarray_attach;
 	rte_fbarray_destroy;
 	rte_fbarray_detach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess
  2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
                     ` (3 preceding siblings ...)
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas Anatoly Burakov
@ 2018-12-20 15:32   ` Anatoly Burakov
  4 siblings, 0 replies; 29+ messages in thread
From: Anatoly Burakov @ 2018-12-20 15:32 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, shahafs, yskoh, thomas, shreyansh.jain

Add multiprocess support for externally allocated memory areas that
are not added to DPDK heap (and add relevant doc sections).

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
 .../prog_guide/env_abstraction_layer.rst      |  3 +
 lib/librte_eal/common/eal_common_memory.c     | 42 +++++++++++++
 lib/librte_eal/common/include/rte_memory.h    | 59 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  2 +
 4 files changed, 106 insertions(+)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 190662e80..5aaac0bd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -290,11 +290,14 @@ The expected workflow is as follows:
 * Register memory within DPDK
     - If IOVA table is not specified, IOVA addresses will be assumed to be
       unavailable
+    - Other processes must attach to the memory area before they can use it
 * Perform DMA mapping with ``rte_vfio_dma_map`` if needed
 * Use the memory area in your application
 * If memory area is no longer needed, it can be unregistered
     - If the area was mapped for DMA, unmapping must be performed before
       unregistering memory
+    - Other processes must detach from the memory area before it can be
+      unregistered
 
 Since these externally allocated memory areas will not be managed by DPDK, it is
 therefore up to the user application to decide how to use them and what to do
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index ea43c1362..051159f80 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -852,6 +852,48 @@ rte_extmem_unregister(void *va_addr, size_t len)
 	return ret;
 }
 
+static int
+sync_memory(void *va_addr, size_t len, bool attach)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int ret = 0;
+
+	if (va_addr == NULL || len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	/* find our segment */
+	msl = malloc_heap_find_external_seg(va_addr, len);
+	if (msl == NULL) {
+		rte_errno = ENOENT;
+		ret = -1;
+		goto unlock;
+	}
+	if (attach)
+		ret = rte_fbarray_attach(&msl->memseg_arr);
+	else
+		ret = rte_fbarray_detach(&msl->memseg_arr);
+
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+	return ret;
+}
+
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, true);
+}
+
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len)
+{
+	return sync_memory(va_addr, len, false);
+}
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index ff23fc2c1..7ca703bb1 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -435,6 +435,10 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
  * @note This API will not perform any DMA mapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
  * @param va_addr
  *   Start of virtual area to register. Must be aligned by ``page_sz``.
  * @param len
@@ -472,6 +476,9 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
  * @note This API will not perform any DMA unmapping. It is expected that user
  *   will do that themselves.
  *
+ * @note Before calling this function, all other processes must call
+ *   ``rte_extmem_detach`` to detach from the memory area.
+ *
  * @param va_addr
  *   Start of virtual area to unregister
  * @param len
@@ -486,6 +493,58 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 int __rte_experimental
 rte_extmem_unregister(void *va_addr, size_t len);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Attach to external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to register
+ * @param len
+ *   Length of virtual area to register
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_attach(void *va_addr, size_t len);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Detach from external memory chunk registered in another process.
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA unmapping. It is expected that user
+ *   will do that themselves.
+ *
+ * @param va_addr
+ *   Start of virtual area to unregister
+ * @param len
+ *   Length of virtual area to unregister
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     ENOENT - memory chunk was not found
+ */
+int __rte_experimental
+rte_extmem_detach(void *va_addr, size_t len);
+
 /**
  * Dump the physical memory layout to a file.
  *
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 593691a14..eb5f7b9cb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -296,6 +296,8 @@ EXPERIMENTAL {
 	rte_devargs_remove;
 	rte_devargs_type_count;
 	rte_eal_cleanup;
+	rte_extmem_attach;
+	rte_extmem_detach;
 	rte_extmem_register;
 	rte_extmem_unregister;
 	rte_fbarray_attach;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Allow using external memory without malloc
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-12-20 16:16     ` Stephen Hemminger
  2018-12-20 17:18       ` Thomas Monjalon
  2018-12-20 17:17     ` Thomas Monjalon
  1 sibling, 1 reply; 29+ messages in thread
From: Stephen Hemminger @ 2018-12-20 16:16 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, shahafs, yskoh, thomas, shreyansh.jain

On Thu, 20 Dec 2018 15:32:37 +0000
Anatoly Burakov <anatoly.burakov@intel.com> wrote:

> Currently, the only way to use externally allocated memory
> is through rte_malloc API's. While this is fine for a lot
> of use cases, it may not be suitable for certain other use
> cases like manual memory management, etc.
> 
> This patchset adds another API to register memory segments
> with DPDK (so that API's like ``rte_mem_virt2memseg`` could
> be relied on by PMD's and such), but not create a malloc
> heap out of them.
> 
> Aside from the obvious (not adding memory to a heap), the
> other major difference between this API and the
> ``rte_malloc_heap_*`` external memory functions is the fact
> that no DMA mapping is performed automatically, as well as
> no mem event callbacks are triggered.
> 
> This really draws a line in the sand, and there are now two
> ways of doing things - do everything automatically (using
> the ``rte_malloc_heap_*`` API's), or do everything manually
> (``rte_extmem_*`` and future DMA mapping API [1] that would
> replace ``rte_vfio_dma_map``). This way, the consistency of
> API is kept, and flexibility is also allowed.
> 
> [1] https://mails.dpdk.org/archives/dev/2018-November/118175.html

Where are the examples for this? Give a sample application maybe.

Also there are no test cases.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Allow using external memory without malloc
  2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
  2018-12-20 16:16     ` Stephen Hemminger
@ 2018-12-20 17:17     ` Thomas Monjalon
  1 sibling, 0 replies; 29+ messages in thread
From: Thomas Monjalon @ 2018-12-20 17:17 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, shahafs, yskoh, shreyansh.jain

> Anatoly Burakov (4):
>   malloc: separate creating memseg list and malloc heap
>   malloc: separate destroying memseg list and heap data
>   mem: allow registering external memory areas
>   mem: allow usage of non-heap external memory in multiprocess

Applied, thanks

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Allow using external memory without malloc
  2018-12-20 16:16     ` Stephen Hemminger
@ 2018-12-20 17:18       ` Thomas Monjalon
  2018-12-21  9:17         ` Burakov, Anatoly
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Monjalon @ 2018-12-20 17:18 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Stephen Hemminger, shahafs, yskoh, shreyansh.jain

20/12/2018 17:16, Stephen Hemminger:
> On Thu, 20 Dec 2018 15:32:37 +0000
> Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> 
> > Currently, the only way to use externally allocated memory
> > is through rte_malloc API's. While this is fine for a lot
> > of use cases, it may not be suitable for certain other use
> > cases like manual memory management, etc.
> > 
> > This patchset adds another API to register memory segments
> > with DPDK (so that API's like ``rte_mem_virt2memseg`` could
> > be relied on by PMD's and such), but not create a malloc
> > heap out of them.
> > 
> > Aside from the obvious (not adding memory to a heap), the
> > other major difference between this API and the
> > ``rte_malloc_heap_*`` external memory functions is the fact
> > that no DMA mapping is performed automatically, as well as
> > no mem event callbacks are triggered.
> > 
> > This really draws a line in the sand, and there are now two
> > ways of doing things - do everything automatically (using
> > the ``rte_malloc_heap_*`` API's), or do everything manually
> > (``rte_extmem_*`` and future DMA mapping API [1] that would
> > replace ``rte_vfio_dma_map``). This way, the consistency of
> > API is kept, and flexibility is also allowed.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2018-November/118175.html
> 
> Where are the examples for this? Give a sample application maybe.
> 
> Also there are no test cases.

It looks to be a big task, but yes, would be nice to have test
of external memory allocation in DPDK unit tests.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] Allow using external memory without malloc
  2018-12-20 17:18       ` Thomas Monjalon
@ 2018-12-21  9:17         ` Burakov, Anatoly
  0 siblings, 0 replies; 29+ messages in thread
From: Burakov, Anatoly @ 2018-12-21  9:17 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Stephen Hemminger, shahafs, yskoh, shreyansh.jain

On 20-Dec-18 5:18 PM, Thomas Monjalon wrote:
> 20/12/2018 17:16, Stephen Hemminger:
>> On Thu, 20 Dec 2018 15:32:37 +0000
>> Anatoly Burakov <anatoly.burakov@intel.com> wrote:
>>
>>> Currently, the only way to use externally allocated memory
>>> is through rte_malloc API's. While this is fine for a lot
>>> of use cases, it may not be suitable for certain other use
>>> cases like manual memory management, etc.
>>>
>>> This patchset adds another API to register memory segments
>>> with DPDK (so that API's like ``rte_mem_virt2memseg`` could
>>> be relied on by PMD's and such), but not create a malloc
>>> heap out of them.
>>>
>>> Aside from the obvious (not adding memory to a heap), the
>>> other major difference between this API and the
>>> ``rte_malloc_heap_*`` external memory functions is the fact
>>> that no DMA mapping is performed automatically, as well as
>>> no mem event callbacks are triggered.
>>>
>>> This really draws a line in the sand, and there are now two
>>> ways of doing things - do everything automatically (using
>>> the ``rte_malloc_heap_*`` API's), or do everything manually
>>> (``rte_extmem_*`` and future DMA mapping API [1] that would
>>> replace ``rte_vfio_dma_map``). This way, the consistency of
>>> API is kept, and flexibility is also allowed.
>>>
>>> [1] https://mails.dpdk.org/archives/dev/2018-November/118175.html
>>
>> Where are the examples for this? Give a sample application maybe.
>>
>> Also there are no test cases.
> 
> It looks to be a big task, but yes, would be nice to have test
> of external memory allocation in DPDK unit tests.
> 

I imagine if i submitted patches for this, since it's test code, it can 
go into rc1? Or is that considered a "feature"?

I don't think it will be a lot of code, there are only 4 new API calls. 
Extending extmem autotest should do the trick. Adding a new testpmd mode 
is also possible but less trivial, and can be postponed to 19.05.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-12-21  9:17 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14  9:33   ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14  9:34   ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14  9:55   ` Yongseok Koh
2018-12-14 11:03     ` Burakov, Anatoly
2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14  9:56   ` Yongseok Koh
2018-12-02  5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
2018-12-02 23:28   ` Yongseok Koh
2018-12-03 10:23     ` Burakov, Anatoly
2018-12-12 12:55       ` Yongseok Koh
2018-12-12 13:17         ` Burakov, Anatoly
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-12-20 16:16     ` Stephen Hemminger
2018-12-20 17:18       ` Thomas Monjalon
2018-12-21  9:17         ` Burakov, Anatoly
2018-12-20 17:17     ` Thomas Monjalon
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).