From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0DB451B92B for ; Fri, 14 Dec 2018 12:03:17 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 03:03:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,352,1539673200"; d="scan'208";a="303811265" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.251.85.142]) ([10.251.85.142]) by fmsmga005.fm.intel.com with ESMTP; 14 Dec 2018 03:03:14 -0800 To: Yongseok Koh Cc: "dev@dpdk.org" , John McNamara , Marko Kovacevic , Shahaf Shuler , Thomas Monjalon , "shreyansh.jain@nxp.com" References: <20181214095531.GC12221@mtidpdk.mti.labs.mlnx> From: "Burakov, Anatoly" Message-ID: Date: Fri, 14 Dec 2018 11:03:14 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <20181214095531.GC12221@mtidpdk.mti.labs.mlnx> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Dec 2018 11:03:18 -0000 On 14-Dec-18 9:55 AM, Yongseok Koh wrote: > On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote: >> The general use-case of using external memory is well covered by >> existing external memory API's. However, certain use cases require >> manual management of externally allocated memory areas, so this >> memory should not be added to the heap. It should, however, be >> added to DPDK's internal structures, so that API's like >> ``rte_virt2memseg`` would work on such external memory segments. >> >> This commit adds such an API to DPDK. The new functions will allow >> to register and unregister externally allocated memory areas, as >> well as documentation for them. >> >> Signed-off-by: Anatoly Burakov >> --- >> .../prog_guide/env_abstraction_layer.rst | 60 ++++++++++++--- >> lib/librte_eal/common/eal_common_memory.c | 74 +++++++++++++++++++ >> lib/librte_eal/common/include/rte_memory.h | 63 ++++++++++++++++ >> lib/librte_eal/rte_eal_version.map | 2 + >> 4 files changed, 189 insertions(+), 10 deletions(-) >> >> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst >> index 8b5d050c7..d7799b626 100644 >> --- a/doc/guides/prog_guide/env_abstraction_layer.rst >> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst >> @@ -212,17 +212,26 @@ Normally, these options do not need to be changed. >> Support for Externally Allocated Memory >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> -It is possible to use externally allocated memory in DPDK, using a set of malloc >> -heap API's. Support for externally allocated memory is implemented through >> -overloading the socket ID - externally allocated heaps will have socket ID's >> -that would be considered invalid under normal circumstances. Requesting an >> -allocation to take place from a specified externally allocated memory is a >> -matter of supplying the correct socket ID to DPDK allocator, either directly >> -(e.g. through a call to ``rte_malloc``) or indirectly (through data >> -structure-specific allocation API's such as ``rte_ring_create``). >> +It is possible to use externally allocated memory in DPDK. There are two ways in >> +which using externally allocated memory can work: the malloc heap API's, and >> +manual memory management. >> >> -Since there is no way DPDK can verify whether memory are is available or valid, >> -this responsibility falls on the shoulders of the user. All multiprocess >> ++ Using heap API's for externally allocated memory >> + >> +Using using a set of malloc heap API's is the recommended way to use externally >> +allocated memory in DPDK. In this way, support for externally allocated memory >> +is implemented through overloading the socket ID - externally allocated heaps >> +will have socket ID's that would be considered invalid under normal >> +circumstances. Requesting an allocation to take place from a specified >> +externally allocated memory is a matter of supplying the correct socket ID to >> +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) or >> +indirectly (through data structure-specific allocation API's such as >> +``rte_ring_create``). Using these API's also ensures that mapping of externally >> +allocated memory for DMA is also performed on any memory segment that is added >> +to a DPDK malloc heap. >> + >> +Since there is no way DPDK can verify whether memory is available or valid, this >> +responsibility falls on the shoulders of the user. All multiprocess >> synchronization is also user's responsibility, as well as ensuring that all >> calls to add/attach/detach/remove memory are done in the correct order. It is >> not required to attach to a memory area in all processes - only attach to memory >> @@ -246,6 +255,37 @@ The expected workflow is as follows: >> For more information, please refer to ``rte_malloc`` API documentation, >> specifically the ``rte_malloc_heap_*`` family of function calls. >> >> ++ Using externally allocated memory without DPDK API's >> + >> +While using heap API's is the recommended method of using externally allocated >> +memory in DPDK, there are certain use cases where the overhead of DPDK heap API >> +is undesirable - for example, when manual memory management is performed on an >> +externally allocated area. To support use cases where externally allocated >> +memory will not be used as part of normal DPDK workflow, there is also another >> +set of API's under the ``rte_extmem_*`` namespace. >> + >> +These API's are (as their name implies) intended to allow registering or >> +unregistering externally allocated memory to/from DPDK's internal page table, to >> +allow API's like ``rte_virt2memseg`` etc. to work with externally allocated >> +memory. Memory added this way will not be available for any regular DPDK >> +allocators; DPDK will leave this memory for the user application to manage. >> + >> +The expected workflow is as follows: >> + >> +* Get a pointer to memory area >> +* Register memory within DPDK >> + - If IOVA table is not specified, IOVA addresses will be assumed to be >> + unavailable >> +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed >> +* Use the memory area in your application >> +* If memory area is no longer needed, it can be unregistered >> + - If the area was mapped for DMA, unmapping must be performed before >> + unregistering memory >> + >> +Since these externally allocated memory areas will not be managed by DPDK, it is >> +therefore up to the user application to decide how to use them and what to do >> +with them once they're registered. >> + >> Per-lcore and Shared Variables >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c >> index d47ea4938..a2e085ae8 100644 >> --- a/lib/librte_eal/common/eal_common_memory.c >> +++ b/lib/librte_eal/common/eal_common_memory.c >> @@ -24,6 +24,7 @@ >> #include "eal_memalloc.h" >> #include "eal_private.h" >> #include "eal_internal_cfg.h" >> +#include "malloc_heap.h" >> >> /* >> * Try to mmap *size bytes in /dev/zero. If it is successful, return the >> @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset) >> return ret; >> } >> >> +int __rte_experimental >> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], >> + unsigned int n_pages, size_t page_sz) >> +{ >> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; >> + unsigned int socket_id; >> + int ret = 0; >> + >> + if (va_addr == NULL || page_sz == 0 || len == 0 || >> + !rte_is_power_of_2(page_sz) || >> + RTE_ALIGN(len, page_sz) != len) { >> + rte_errno = EINVAL; >> + return -1; >> + } > > Isn't it better to have more sanity check? E.g, (len / page_sz == n_pages) like > rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shouldn't > it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn't > have it either... Also you might want to add it to documentation that > granularity of these registrations is a page. > Hi Yongseok, Thanks for your review. n_pages is allowed to be 0 if iovas[] is NULL. However, you're correct in that more sanity checking and documentation re: page alignment would be beneficial. I'll submit a v2. > Otherwise, > > Acked-by: Yongseok Koh > Thanks > >> + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); >> + >> + /* make sure the segment doesn't already exist */ >> + if (malloc_heap_find_external_seg(va_addr, len) != NULL) { >> + rte_errno = EEXIST; >> + ret = -1; >> + goto unlock; >> + } >> + >> + /* get next available socket ID */ >> + socket_id = mcfg->next_socket_id; >> + if (socket_id > INT32_MAX) { >> + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); >> + rte_errno = ENOSPC; >> + ret = -1; >> + goto unlock; >> + } >> + >> + /* we can create a new memseg */ >> + if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages, >> + page_sz, "extmem", socket_id) == NULL) { >> + ret = -1; >> + goto unlock; >> + } >> + >> + /* memseg list successfully created - increment next socket ID */ >> + mcfg->next_socket_id++; >> +unlock: >> + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); >> + return ret; >> +} >> + >> +int __rte_experimental >> +rte_extmem_unregister(void *va_addr, size_t len) >> +{ >> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; >> + struct rte_memseg_list *msl; >> + int ret = 0; >> + >> + if (va_addr == NULL || len == 0) { >> + rte_errno = EINVAL; >> + return -1; >> + } >> + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); >> + >> + /* find our segment */ >> + msl = malloc_heap_find_external_seg(va_addr, len); >> + if (msl == NULL) { >> + rte_errno = ENOENT; >> + ret = -1; >> + goto unlock; >> + } >> + >> + ret = malloc_heap_destroy_external_seg(msl); >> +unlock: >> + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); >> + return ret; >> +} >> + >> /* init memory subsystem */ >> int >> rte_eal_memory_init(void) >> diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h >> index d970825df..4a43c1a9e 100644 >> --- a/lib/librte_eal/common/include/rte_memory.h >> +++ b/lib/librte_eal/common/include/rte_memory.h >> @@ -423,6 +423,69 @@ int __rte_experimental >> rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms, >> size_t *offset); >> >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice >> + * >> + * Register external memory chunk with DPDK. >> + * >> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of >> + * API's. >> + * >> + * @note This API will not perform any DMA mapping. It is expected that user >> + * will do that themselves. >> + * >> + * @param va_addr >> + * Start of virtual area to register >> + * @param len >> + * Length of virtual area to register >> + * @param iova_addrs >> + * Array of page IOVA addresses corresponding to each page in this memory >> + * area. Can be NULL, in which case page IOVA addresses will be set to >> + * RTE_BAD_IOVA. >> + * @param n_pages >> + * Number of elements in the iova_addrs array. Ignored if ``iova_addrs`` >> + * is NULL. >> + * @param page_sz >> + * Page size of the underlying memory >> + * >> + * @return >> + * - 0 on success >> + * - -1 in case of error, with rte_errno set to one of the following: >> + * EINVAL - one of the parameters was invalid >> + * EEXIST - memory chunk is already registered >> + * ENOSPC - no more space in internal config to store a new memory chunk >> + */ >> +int __rte_experimental >> +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], >> + unsigned int n_pages, size_t page_sz); >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice >> + * >> + * Unregister external memory chunk with DPDK. >> + * >> + * @note Using this API is mutually exclusive with ``rte_malloc`` family of >> + * API's. >> + * >> + * @note This API will not perform any DMA unmapping. It is expected that user >> + * will do that themselves. >> + * >> + * @param va_addr >> + * Start of virtual area to unregister >> + * @param len >> + * Length of virtual area to unregister >> + * >> + * @return >> + * - 0 on success >> + * - -1 in case of error, with rte_errno set to one of the following: >> + * EINVAL - one of the parameters was invalid >> + * ENOENT - memory chunk was not found >> + */ >> +int __rte_experimental >> +rte_extmem_unregister(void *va_addr, size_t len); >> + >> /** >> * Dump the physical memory layout to a file. >> * >> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map >> index 3fe78260d..593691a14 100644 >> --- a/lib/librte_eal/rte_eal_version.map >> +++ b/lib/librte_eal/rte_eal_version.map >> @@ -296,6 +296,8 @@ EXPERIMENTAL { >> rte_devargs_remove; >> rte_devargs_type_count; >> rte_eal_cleanup; >> + rte_extmem_register; >> + rte_extmem_unregister; >> rte_fbarray_attach; >> rte_fbarray_destroy; >> rte_fbarray_detach; >> -- >> 2.17.1 > -- Thanks, Anatoly