DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Yongseok Koh <yskoh@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	John McNamara <john.mcnamara@intel.com>,
	Marko Kovacevic <marko.kovacevic@intel.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	"shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
Subject: Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas
Date: Fri, 14 Dec 2018 11:03:14 +0000	[thread overview]
Message-ID: <d9e56373-3b85-5fd5-fa97-48d68e508b8b@intel.com> (raw)
In-Reply-To: <20181214095531.GC12221@mtidpdk.mti.labs.mlnx>

On 14-Dec-18 9:55 AM, Yongseok Koh wrote:
> On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
>> The general use-case of using external memory is well covered by
>> existing external memory API's. However, certain use cases require
>> manual management of externally allocated memory areas, so this
>> memory should not be added to the heap. It should, however, be
>> added to DPDK's internal structures, so that API's like
>> ``rte_virt2memseg`` would work on such external memory segments.
>>
>> This commit adds such an API to DPDK. The new functions will allow
>> to register and unregister externally allocated memory areas, as
>> well as documentation for them.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   .../prog_guide/env_abstraction_layer.rst      | 60 ++++++++++++---
>>   lib/librte_eal/common/eal_common_memory.c     | 74 +++++++++++++++++++
>>   lib/librte_eal/common/include/rte_memory.h    | 63 ++++++++++++++++
>>   lib/librte_eal/rte_eal_version.map            |  2 +
>>   4 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 8b5d050c7..d7799b626 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>>   Support for Externally Allocated Memory
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> -It is possible to use externally allocated memory in DPDK, using a set of malloc
>> -heap API's. Support for externally allocated memory is implemented through
>> -overloading the socket ID - externally allocated heaps will have socket ID's
>> -that would be considered invalid under normal circumstances. Requesting an
>> -allocation to take place from a specified externally allocated memory is a
>> -matter of supplying the correct socket ID to DPDK allocator, either directly
>> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
>> -structure-specific allocation API's such as ``rte_ring_create``).
>> +It is possible to use externally allocated memory in DPDK. There are two ways in
>> +which using externally allocated memory can work: the malloc heap API's, and
>> +manual memory management.
>>   
>> -Since there is no way DPDK can verify whether memory are is available or valid,
>> -this responsibility falls on the shoulders of the user. All multiprocess
>> ++ Using heap API's for externally allocated memory
>> +
>> +Using using a set of malloc heap API's is the recommended way to use externally
>> +allocated memory in DPDK. In this way, support for externally allocated memory
>> +is implemented through overloading the socket ID - externally allocated heaps
>> +will have socket ID's that would be considered invalid under normal
>> +circumstances. Requesting an allocation to take place from a specified
>> +externally allocated memory is a matter of supplying the correct socket ID to
>> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
>> +indirectly (through data structure-specific allocation API's such as
>> +``rte_ring_create``). Using these API's also ensures that mapping of externally
>> +allocated memory for DMA is also performed on any memory segment that is added
>> +to a DPDK malloc heap.
>> +
>> +Since there is no way DPDK can verify whether memory is available or valid, this
>> +responsibility falls on the shoulders of the user. All multiprocess
>>   synchronization is also user's responsibility, as well as ensuring  that all
>>   calls to add/attach/detach/remove memory are done in the correct order. It is
>>   not required to attach to a memory area in all processes - only attach to memory
>> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>>   For more information, please refer to ``rte_malloc`` API documentation,
>>   specifically the ``rte_malloc_heap_*`` family of function calls.
>>   
>> ++ Using externally allocated memory without DPDK API's
>> +
>> +While using heap API's is the recommended method of using externally allocated
>> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
>> +is undesirable - for example, when manual memory management is performed on an
>> +externally allocated area. To support use cases where externally allocated
>> +memory will not be used as part of normal DPDK workflow, there is also another
>> +set of API's under the ``rte_extmem_*`` namespace.
>> +
>> +These API's are (as their name implies) intended to allow registering or
>> +unregistering externally allocated memory to/from DPDK's internal page table, to
>> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
>> +memory. Memory added this way will not be available for any regular DPDK
>> +allocators; DPDK will leave this memory for the user application to manage.
>> +
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Register memory within DPDK
>> +    - If IOVA table is not specified, IOVA addresses will be assumed to be
>> +      unavailable
>> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>> +* Use the memory area in your application
>> +* If memory area is no longer needed, it can be unregistered
>> +    - If the area was mapped for DMA, unmapping must be performed before
>> +      unregistering memory
>> +
>> +Since these externally allocated memory areas will not be managed by DPDK, it is
>> +therefore up to the user application to decide how to use them and what to do
>> +with them once they're registered.
>> +
>>   Per-lcore and Shared Variables
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>   
>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
>> index d47ea4938..a2e085ae8 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,7 @@
>>   #include "eal_memalloc.h"
>>   #include "eal_private.h"
>>   #include "eal_internal_cfg.h"
>> +#include "malloc_heap.h"
>>   
>>   /*
>>    * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>>   	return ret;
>>   }
>>   
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	unsigned int socket_id;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || page_sz == 0 || len == 0 ||
>> +			!rte_is_power_of_2(page_sz) ||
>> +			RTE_ALIGN(len, page_sz) != len) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
> 
> Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
> rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
> it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
> have it either... Also you might want to add it to documentation that
> granularity of these registrations is a page.
> 

Hi Yongseok,

Thanks for your review.

n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct 
in that more sanity checking and documentation re: page alignment would 
be beneficial. I'll submit a v2.


> Otherwise,
> 
> Acked-by: Yongseok Koh <yskoh@mellanox.com>
> Thanks
> 
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* make sure the segment doesn't already exist */
>> +	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
>> +		rte_errno = EEXIST;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* get next available socket ID */
>> +	socket_id = mcfg->next_socket_id;
>> +	if (socket_id > INT32_MAX) {
>> +		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
>> +		rte_errno = ENOSPC;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* we can create a new memseg */
>> +	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
>> +			page_sz, "extmem", socket_id) == NULL) {
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	/* memseg list successfully created - increment next socket ID */
>> +	mcfg->next_socket_id++;
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len)
>> +{
>> +	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> +	struct rte_memseg_list *msl;
>> +	int ret = 0;
>> +
>> +	if (va_addr == NULL || len == 0) {
>> +		rte_errno = EINVAL;
>> +		return -1;
>> +	}
>> +	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> +	/* find our segment */
>> +	msl = malloc_heap_find_external_seg(va_addr, len);
>> +	if (msl == NULL) {
>> +		rte_errno = ENOENT;
>> +		ret = -1;
>> +		goto unlock;
>> +	}
>> +
>> +	ret = malloc_heap_destroy_external_seg(msl);
>> +unlock:
>> +	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> +	return ret;
>> +}
>> +
>>   /* init memory subsystem */
>>   int
>>   rte_eal_memory_init(void)
>> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
>> index d970825df..4a43c1a9e 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -423,6 +423,69 @@ int __rte_experimental
>>   rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>>   		size_t *offset);
>>   
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA mapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to register
>> + * @param len
>> + *   Length of virtual area to register
>> + * @param iova_addrs
>> + *   Array of page IOVA addresses corresponding to each page in this memory
>> + *   area. Can be NULL, in which case page IOVA addresses will be set to
>> + *   RTE_BAD_IOVA.
>> + * @param n_pages
>> + *   Number of elements in the iova_addrs array. Ignored if  ``iova_addrs``
>> + *   is NULL.
>> + * @param page_sz
>> + *   Page size of the underlying memory
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     EEXIST - memory chunk is already registered
>> + *     ENOSPC - no more space in internal config to store a new memory chunk
>> + */
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> +		unsigned int n_pages, size_t page_sz);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + *   API's.
>> + *
>> + * @note This API will not perform any DMA unmapping. It is expected that user
>> + *   will do that themselves.
>> + *
>> + * @param va_addr
>> + *   Start of virtual area to unregister
>> + * @param len
>> + *   Length of virtual area to unregister
>> + *
>> + * @return
>> + *   - 0 on success
>> + *   - -1 in case of error, with rte_errno set to one of the following:
>> + *     EINVAL - one of the parameters was invalid
>> + *     ENOENT - memory chunk was not found
>> + */
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len);
>> +
>>   /**
>>    * Dump the physical memory layout to a file.
>>    *
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 3fe78260d..593691a14 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>>   	rte_devargs_remove;
>>   	rte_devargs_type_count;
>>   	rte_eal_cleanup;
>> +	rte_extmem_register;
>> +	rte_extmem_unregister;
>>   	rte_fbarray_attach;
>>   	rte_fbarray_destroy;
>>   	rte_fbarray_detach;
>> -- 
>> 2.17.1
> 


-- 
Thanks,
Anatoly

  reply	other threads:[~2018-12-14 11:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14  9:33   ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14  9:34   ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14  9:55   ` Yongseok Koh
2018-12-14 11:03     ` Burakov, Anatoly [this message]
2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14  9:56   ` Yongseok Koh
2018-12-02  5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
2018-12-02 23:28   ` Yongseok Koh
2018-12-03 10:23     ` Burakov, Anatoly
2018-12-12 12:55       ` Yongseok Koh
2018-12-12 13:17         ` Burakov, Anatoly
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-12-20 16:16     ` Stephen Hemminger
2018-12-20 17:18       ` Thomas Monjalon
2018-12-21  9:17         ` Burakov, Anatoly
2018-12-20 17:17     ` Thomas Monjalon
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-20 15:32   ` [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9e56373-3b85-5fd5-fa97-48d68e508b8b@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=john.mcnamara@intel.com \
    --cc=marko.kovacevic@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=shreyansh.jain@nxp.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).