From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Yongseok Koh <yskoh@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
John McNamara <john.mcnamara@intel.com>,
Marko Kovacevic <marko.kovacevic@intel.com>,
Shahaf Shuler <shahafs@mellanox.com>,
Thomas Monjalon <thomas@monjalon.net>,
"shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
Subject: Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas
Date: Fri, 14 Dec 2018 11:03:14 +0000 [thread overview]
Message-ID: <d9e56373-3b85-5fd5-fa97-48d68e508b8b@intel.com> (raw)
In-Reply-To: <20181214095531.GC12221@mtidpdk.mti.labs.mlnx>
On 14-Dec-18 9:55 AM, Yongseok Koh wrote:
> On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote:
>> The general use-case of using external memory is well covered by
>> existing external memory API's. However, certain use cases require
>> manual management of externally allocated memory areas, so this
>> memory should not be added to the heap. It should, however, be
>> added to DPDK's internal structures, so that API's like
>> ``rte_virt2memseg`` would work on such external memory segments.
>>
>> This commit adds such an API to DPDK. The new functions will allow
>> to register and unregister externally allocated memory areas, as
>> well as documentation for them.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> .../prog_guide/env_abstraction_layer.rst | 60 ++++++++++++---
>> lib/librte_eal/common/eal_common_memory.c | 74 +++++++++++++++++++
>> lib/librte_eal/common/include/rte_memory.h | 63 ++++++++++++++++
>> lib/librte_eal/rte_eal_version.map | 2 +
>> 4 files changed, 189 insertions(+), 10 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 8b5d050c7..d7799b626 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed.
>> Support for Externally Allocated Memory
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> -It is possible to use externally allocated memory in DPDK, using a set of malloc
>> -heap API's. Support for externally allocated memory is implemented through
>> -overloading the socket ID - externally allocated heaps will have socket ID's
>> -that would be considered invalid under normal circumstances. Requesting an
>> -allocation to take place from a specified externally allocated memory is a
>> -matter of supplying the correct socket ID to DPDK allocator, either directly
>> -(e.g. through a call to ``rte_malloc``) or indirectly (through data
>> -structure-specific allocation API's such as ``rte_ring_create``).
>> +It is possible to use externally allocated memory in DPDK. There are two ways in
>> +which using externally allocated memory can work: the malloc heap API's, and
>> +manual memory management.
>>
>> -Since there is no way DPDK can verify whether memory are is available or valid,
>> -this responsibility falls on the shoulders of the user. All multiprocess
>> ++ Using heap API's for externally allocated memory
>> +
>> +Using using a set of malloc heap API's is the recommended way to use externally
>> +allocated memory in DPDK. In this way, support for externally allocated memory
>> +is implemented through overloading the socket ID - externally allocated heaps
>> +will have socket ID's that would be considered invalid under normal
>> +circumstances. Requesting an allocation to take place from a specified
>> +externally allocated memory is a matter of supplying the correct socket ID to
>> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or
>> +indirectly (through data structure-specific allocation API's such as
>> +``rte_ring_create``). Using these API's also ensures that mapping of externally
>> +allocated memory for DMA is also performed on any memory segment that is added
>> +to a DPDK malloc heap.
>> +
>> +Since there is no way DPDK can verify whether memory is available or valid, this
>> +responsibility falls on the shoulders of the user. All multiprocess
>> synchronization is also user's responsibility, as well as ensuring that all
>> calls to add/attach/detach/remove memory are done in the correct order. It is
>> not required to attach to a memory area in all processes - only attach to memory
>> @@ -246,6 +255,37 @@ The expected workflow is as follows:
>> For more information, please refer to ``rte_malloc`` API documentation,
>> specifically the ``rte_malloc_heap_*`` family of function calls.
>>
>> ++ Using externally allocated memory without DPDK API's
>> +
>> +While using heap API's is the recommended method of using externally allocated
>> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API
>> +is undesirable - for example, when manual memory management is performed on an
>> +externally allocated area. To support use cases where externally allocated
>> +memory will not be used as part of normal DPDK workflow, there is also another
>> +set of API's under the ``rte_extmem_*`` namespace.
>> +
>> +These API's are (as their name implies) intended to allow registering or
>> +unregistering externally allocated memory to/from DPDK's internal page table, to
>> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated
>> +memory. Memory added this way will not be available for any regular DPDK
>> +allocators; DPDK will leave this memory for the user application to manage.
>> +
>> +The expected workflow is as follows:
>> +
>> +* Get a pointer to memory area
>> +* Register memory within DPDK
>> + - If IOVA table is not specified, IOVA addresses will be assumed to be
>> + unavailable
>> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed
>> +* Use the memory area in your application
>> +* If memory area is no longer needed, it can be unregistered
>> + - If the area was mapped for DMA, unmapping must be performed before
>> + unregistering memory
>> +
>> +Since these externally allocated memory areas will not be managed by DPDK, it is
>> +therefore up to the user application to decide how to use them and what to do
>> +with them once they're registered.
>> +
>> Per-lcore and Shared Variables
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
>> index d47ea4938..a2e085ae8 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -24,6 +24,7 @@
>> #include "eal_memalloc.h"
>> #include "eal_private.h"
>> #include "eal_internal_cfg.h"
>> +#include "malloc_heap.h"
>>
>> /*
>> * Try to mmap *size bytes in /dev/zero. If it is successful, return the
>> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
>> return ret;
>> }
>>
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> + unsigned int n_pages, size_t page_sz)
>> +{
>> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> + unsigned int socket_id;
>> + int ret = 0;
>> +
>> + if (va_addr == NULL || page_sz == 0 || len == 0 ||
>> + !rte_is_power_of_2(page_sz) ||
>> + RTE_ALIGN(len, page_sz) != len) {
>> + rte_errno = EINVAL;
>> + return -1;
>> + }
>
> Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like
> rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't
> it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't
> have it either... Also you might want to add it to documentation that
> granularity of these registrations is a page.
>
Hi Yongseok,
Thanks for your review.
n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct
in that more sanity checking and documentation re: page alignment would
be beneficial. I'll submit a v2.
> Otherwise,
>
> Acked-by: Yongseok Koh <yskoh@mellanox.com>
> Thanks
>
>> + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> + /* make sure the segment doesn't already exist */
>> + if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
>> + rte_errno = EEXIST;
>> + ret = -1;
>> + goto unlock;
>> + }
>> +
>> + /* get next available socket ID */
>> + socket_id = mcfg->next_socket_id;
>> + if (socket_id > INT32_MAX) {
>> + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
>> + rte_errno = ENOSPC;
>> + ret = -1;
>> + goto unlock;
>> + }
>> +
>> + /* we can create a new memseg */
>> + if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
>> + page_sz, "extmem", socket_id) == NULL) {
>> + ret = -1;
>> + goto unlock;
>> + }
>> +
>> + /* memseg list successfully created - increment next socket ID */
>> + mcfg->next_socket_id++;
>> +unlock:
>> + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> + return ret;
>> +}
>> +
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len)
>> +{
>> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
>> + struct rte_memseg_list *msl;
>> + int ret = 0;
>> +
>> + if (va_addr == NULL || len == 0) {
>> + rte_errno = EINVAL;
>> + return -1;
>> + }
>> + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
>> +
>> + /* find our segment */
>> + msl = malloc_heap_find_external_seg(va_addr, len);
>> + if (msl == NULL) {
>> + rte_errno = ENOENT;
>> + ret = -1;
>> + goto unlock;
>> + }
>> +
>> + ret = malloc_heap_destroy_external_seg(msl);
>> +unlock:
>> + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
>> + return ret;
>> +}
>> +
>> /* init memory subsystem */
>> int
>> rte_eal_memory_init(void)
>> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
>> index d970825df..4a43c1a9e 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -423,6 +423,69 @@ int __rte_experimental
>> rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
>> size_t *offset);
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Register external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + * API's.
>> + *
>> + * @note This API will not perform any DMA mapping. It is expected that user
>> + * will do that themselves.
>> + *
>> + * @param va_addr
>> + * Start of virtual area to register
>> + * @param len
>> + * Length of virtual area to register
>> + * @param iova_addrs
>> + * Array of page IOVA addresses corresponding to each page in this memory
>> + * area. Can be NULL, in which case page IOVA addresses will be set to
>> + * RTE_BAD_IOVA.
>> + * @param n_pages
>> + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
>> + * is NULL.
>> + * @param page_sz
>> + * Page size of the underlying memory
>> + *
>> + * @return
>> + * - 0 on success
>> + * - -1 in case of error, with rte_errno set to one of the following:
>> + * EINVAL - one of the parameters was invalid
>> + * EEXIST - memory chunk is already registered
>> + * ENOSPC - no more space in internal config to store a new memory chunk
>> + */
>> +int __rte_experimental
>> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
>> + unsigned int n_pages, size_t page_sz);
>> +
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice
>> + *
>> + * Unregister external memory chunk with DPDK.
>> + *
>> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of
>> + * API's.
>> + *
>> + * @note This API will not perform any DMA unmapping. It is expected that user
>> + * will do that themselves.
>> + *
>> + * @param va_addr
>> + * Start of virtual area to unregister
>> + * @param len
>> + * Length of virtual area to unregister
>> + *
>> + * @return
>> + * - 0 on success
>> + * - -1 in case of error, with rte_errno set to one of the following:
>> + * EINVAL - one of the parameters was invalid
>> + * ENOENT - memory chunk was not found
>> + */
>> +int __rte_experimental
>> +rte_extmem_unregister(void *va_addr, size_t len);
>> +
>> /**
>> * Dump the physical memory layout to a file.
>> *
>> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
>> index 3fe78260d..593691a14 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -296,6 +296,8 @@ EXPERIMENTAL {
>> rte_devargs_remove;
>> rte_devargs_type_count;
>> rte_eal_cleanup;
>> + rte_extmem_register;
>> + rte_extmem_unregister;
>> rte_fbarray_attach;
>> rte_fbarray_destroy;
>> rte_fbarray_detach;
>> --
>> 2.17.1
>
--
Thanks,
Anatoly
next prev parent reply other threads:[~2018-12-14 11:03 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-29 13:48 [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Anatoly Burakov
2018-11-29 13:48 ` [dpdk-dev] [PATCH 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14 9:33 ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14 9:34 ` Yongseok Koh
2018-11-29 13:48 ` [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14 9:55 ` Yongseok Koh
2018-12-14 11:03 ` Burakov, Anatoly [this message]
2018-11-29 13:48 ` [dpdk-dev] [PATCH 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14 9:56 ` Yongseok Koh
2018-12-02 5:48 ` [dpdk-dev] [PATCH 0/4] Allow using external memory without malloc Shahaf Shuler
2018-12-02 23:28 ` Yongseok Koh
2018-12-03 10:23 ` Burakov, Anatoly
2018-12-12 12:55 ` Yongseok Koh
2018-12-12 13:17 ` Burakov, Anatoly
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-12-20 15:32 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-12-20 16:16 ` Stephen Hemminger
2018-12-20 17:18 ` Thomas Monjalon
2018-12-21 9:17 ` Burakov, Anatoly
2018-12-20 17:17 ` Thomas Monjalon
2018-12-20 15:32 ` [dpdk-dev] [PATCH v3 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-20 15:32 ` [dpdk-dev] [PATCH v3 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-20 15:32 ` [dpdk-dev] [PATCH v3 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-20 15:32 ` [dpdk-dev] [PATCH v3 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 1/4] malloc: separate creating memseg list and malloc heap Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 2/4] malloc: separate destroying memseg list and heap data Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 3/4] mem: allow registering external memory areas Anatoly Burakov
2018-12-14 11:50 ` [dpdk-dev] [PATCH v2 4/4] mem: allow usage of non-heap external memory in multiprocess Anatoly Burakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d9e56373-3b85-5fd5-fa97-48d68e508b8b@intel.com \
--to=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=john.mcnamara@intel.com \
--cc=marko.kovacevic@intel.com \
--cc=shahafs@mellanox.com \
--cc=shreyansh.jain@nxp.com \
--cc=thomas@monjalon.net \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).