From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-eopbgr70079.outbound.protection.outlook.com [40.107.7.79]) by dpdk.org (Postfix) with ESMTP id 07D811B9BB for ; Fri, 14 Dec 2018 10:55:43 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ypIAMtBRPHRIxbzHE0n5F7nEvmx3D4/QsERXXv3Osuc=; b=PBy8oKwf2Ewg+c5rVdXjeSoVZ51hHQAxIfnXEXH7QaBgoq0iGXUFMcMpuheP6M5pUtyghXFdHsDOPILQ+KCUINDyGmLTLkuAxi0D/JARxsjalssyrX5B9BdWyd/e/FmmKzYjlv0FLx+WffHhjcheYh8PxypcnPcwnBPqlcJxbHM= Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com (52.134.72.27) by DB3PR0502MB4041.eurprd05.prod.outlook.com (52.134.66.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1425.19; Fri, 14 Dec 2018 09:55:41 +0000 Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::dcbc:4578:3018:50f3]) by DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::dcbc:4578:3018:50f3%5]) with mapi id 15.20.1404.026; Fri, 14 Dec 2018 09:55:41 +0000 From: Yongseok Koh To: Anatoly Burakov CC: "dev@dpdk.org" , John McNamara , Marko Kovacevic , Shahaf Shuler , Thomas Monjalon , "shreyansh.jain@nxp.com" Thread-Topic: [PATCH 3/4] mem: allow registering external memory areas Thread-Index: AQHUh+o+vWYQtLUCBE+R5EIl+OgEa6V+FngA Date: Fri, 14 Dec 2018 09:55:40 +0000 Message-ID: <20181214095531.GC12221@mtidpdk.mti.labs.mlnx> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BYAPR08CA0001.namprd08.prod.outlook.com (2603:10b6:a03:100::14) To DB3PR0502MB3980.eurprd05.prod.outlook.com (2603:10a6:8:10::27) authentication-results: spf=none (sender IP is ) smtp.mailfrom=yskoh@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [209.116.155.178] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB3PR0502MB4041; 6:X+DKZ8dP81ZXuItFya3XJMSe2+QOqSCUCiYcDIjAU5Rwb9LG88HShQ91lX/ZmytZJGsNjhqx5W31ByzSKmQfhmVaFcJsCST6uNKXy+vRzX65gQYeuzHNnH9K+um5HHSCYT8BoP76ed8pYNVUS2H//Cy/O+tFbFCoOyCY/URX4vrmMjcYXwyDgI4U6BDvLXfSPyC1Hlvyfwqhq3pmZYcqA+u0HTiNNzBCp7tDvxRCMpavH6gndQi9pK/Ry5XdrMmklTbjs9zZwV1u79NUV+lT8JNkQtZ9VkJvrzIrjrSQ9y5Ckf43uYw1VcI8xUhVSXiglyfefwl9TsrYOz2vhaVt9CzuBqpIII/n0wlwnVpnIGQfbf7YGKGoBoXdATxonDvaumDB45odDRzoxXNh3w+QYjbDUgLU0AP1jSe2qEby9fEAIVhjwthh1AF4ZdHJHn0oT72VJDD6oVsklJ7S4p8mgw==; 5:Ga35ZAhIaf67ebHhujawnH19tGgzcaIgTcPE0hPL41cxgW6mjiBMORtrtw9MrxBFlxRkbtQbJrvhtHcBGmp00yeb2PIpZMsVqzWJeEvk+SBY4YYIUxcm0RXAr0Ih1AMBl3QbWy+qP2kD0MDhU3ItFfD4CPVa6zjgiePvbe13IjM=; 7:tQvjzcxjM2MvE1R/1nkItPHnYXUhw8AoS9BnZ12ch8gHJVGql46MfuERlutJjTAN01XTRajTmzxvNuMvOFtf9vJIXGZjTfXF93JlszYKHg6YJVmuByV+KJgw+nq9II6elxULTHjCnLEFZWUVGAQCEQ== x-ms-office365-filtering-correlation-id: 66ac596e-30bf-4fd3-f7a2-08d661aa4e58 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:DB3PR0502MB4041; x-ms-traffictypediagnostic: DB3PR0502MB4041: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(3230021)(999002)(6040522)(2401047)(5005006)(8121501046)(3231475)(944501520)(52105112)(93006095)(93001095)(3002001)(10201501046)(6055026)(148016)(149066)(150057)(6041310)(20161123558120)(20161123560045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(201708071742011)(7699051)(76991095); SRVR:DB3PR0502MB4041; BCL:0; PCL:0; RULEID:; SRVR:DB3PR0502MB4041; x-forefront-prvs: 08864C38AC x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(396003)(136003)(346002)(376002)(39860400002)(199004)(189003)(5660300001)(476003)(486006)(229853002)(4326008)(105586002)(6486002)(478600001)(6436002)(6246003)(53936002)(97736004)(9686003)(6512007)(68736007)(26005)(186003)(76176011)(14444005)(14454004)(102836004)(256004)(5024004)(6506007)(386003)(1076002)(71200400001)(3846002)(7736002)(99286004)(71190400001)(25786009)(81166006)(6116002)(8936002)(81156014)(8676002)(4744004)(33896004)(6916009)(446003)(106356001)(316002)(52116002)(11346002)(305945005)(66066001)(33656002)(2906002)(86362001)(54906003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0502MB4041; H:DB3PR0502MB3980.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: AM+fCE3AZAIv2+i5AZcbWGwcjAdLwrpxrNUVbIrtiU3Fd7EM8YTNVMiSvQPeSpLK32TKHafpm02fHjSXzbnbREm94r6Gi17/HYp+nyhLT01ZIbLhnZ7Hpub0/7P7NnIypVAa+vmyUkaqjeCj4SFfrQ/oZBerc8XeHkf/CAJ6Tjs51wIh/FTPwYFGuUYhM1Td5OGD0WVnKwxUOlEBPW+POW4KhfIVNnHHno/uQQyJTBB6uYxn0IG2NtkwotGyVUp31J3qlnFWJ/CqC6Sdv2bxwnwLpyzC4JHeHssxPMPRKCZCSqF4R2w+j7skHKhMy0E8 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-ID: <48D11F61BBF1B647A9D1C41765AF572B@eurprd05.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 66ac596e-30bf-4fd3-f7a2-08d661aa4e58 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Dec 2018 09:55:40.9744 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0502MB4041 Subject: Re: [dpdk-dev] [PATCH 3/4] mem: allow registering external memory areas X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Dec 2018 09:55:43 -0000 On Thu, Nov 29, 2018 at 01:48:34PM +0000, Anatoly Burakov wrote: > The general use-case of using external memory is well covered by > existing external memory API's. However, certain use cases require > manual management of externally allocated memory areas, so this > memory should not be added to the heap. It should, however, be > added to DPDK's internal structures, so that API's like > ``rte_virt2memseg`` would work on such external memory segments. >=20 > This commit adds such an API to DPDK. The new functions will allow > to register and unregister externally allocated memory areas, as > well as documentation for them. >=20 > Signed-off-by: Anatoly Burakov > --- > .../prog_guide/env_abstraction_layer.rst | 60 ++++++++++++--- > lib/librte_eal/common/eal_common_memory.c | 74 +++++++++++++++++++ > lib/librte_eal/common/include/rte_memory.h | 63 ++++++++++++++++ > lib/librte_eal/rte_eal_version.map | 2 + > 4 files changed, 189 insertions(+), 10 deletions(-) >=20 > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides= /prog_guide/env_abstraction_layer.rst > index 8b5d050c7..d7799b626 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -212,17 +212,26 @@ Normally, these options do not need to be changed. > Support for Externally Allocated Memory > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > =20 > -It is possible to use externally allocated memory in DPDK, using a set o= f malloc > -heap API's. Support for externally allocated memory is implemented throu= gh > -overloading the socket ID - externally allocated heaps will have socket = ID's > -that would be considered invalid under normal circumstances. Requesting = an > -allocation to take place from a specified externally allocated memory is= a > -matter of supplying the correct socket ID to DPDK allocator, either dire= ctly > -(e.g. through a call to ``rte_malloc``) or indirectly (through data > -structure-specific allocation API's such as ``rte_ring_create``). > +It is possible to use externally allocated memory in DPDK. There are two= ways in > +which using externally allocated memory can work: the malloc heap API's,= and > +manual memory management. > =20 > -Since there is no way DPDK can verify whether memory are is available or= valid, > -this responsibility falls on the shoulders of the user. All multiprocess > ++ Using heap API's for externally allocated memory > + > +Using using a set of malloc heap API's is the recommended way to use ext= ernally > +allocated memory in DPDK. In this way, support for externally allocated = memory > +is implemented through overloading the socket ID - externally allocated = heaps > +will have socket ID's that would be considered invalid under normal > +circumstances. Requesting an allocation to take place from a specified > +externally allocated memory is a matter of supplying the correct socket = ID to > +DPDK allocator, either directly (e.g. through a call to ``rte_malloc``) = or > +indirectly (through data structure-specific allocation API's such as > +``rte_ring_create``). Using these API's also ensures that mapping of ext= ernally > +allocated memory for DMA is also performed on any memory segment that is= added > +to a DPDK malloc heap. > + > +Since there is no way DPDK can verify whether memory is available or val= id, this > +responsibility falls on the shoulders of the user. All multiprocess > synchronization is also user's responsibility, as well as ensuring that= all > calls to add/attach/detach/remove memory are done in the correct order. = It is > not required to attach to a memory area in all processes - only attach t= o memory > @@ -246,6 +255,37 @@ The expected workflow is as follows: > For more information, please refer to ``rte_malloc`` API documentation, > specifically the ``rte_malloc_heap_*`` family of function calls. > =20 > ++ Using externally allocated memory without DPDK API's > + > +While using heap API's is the recommended method of using externally all= ocated > +memory in DPDK, there are certain use cases where the overhead of DPDK h= eap API > +is undesirable - for example, when manual memory management is performed= on an > +externally allocated area. To support use cases where externally allocat= ed > +memory will not be used as part of normal DPDK workflow, there is also a= nother > +set of API's under the ``rte_extmem_*`` namespace. > + > +These API's are (as their name implies) intended to allow registering or > +unregistering externally allocated memory to/from DPDK's internal page t= able, to > +allow API's like ``rte_virt2memseg`` etc. to work with externally alloca= ted > +memory. Memory added this way will not be available for any regular DPDK > +allocators; DPDK will leave this memory for the user application to mana= ge. > + > +The expected workflow is as follows: > + > +* Get a pointer to memory area > +* Register memory within DPDK > + - If IOVA table is not specified, IOVA addresses will be assumed to = be > + unavailable > +* Perform DMA mapping with ``rte_vfio_dma_map`` if needed > +* Use the memory area in your application > +* If memory area is no longer needed, it can be unregistered > + - If the area was mapped for DMA, unmapping must be performed before > + unregistering memory > + > +Since these externally allocated memory areas will not be managed by DPD= K, it is > +therefore up to the user application to decide how to use them and what = to do > +with them once they're registered. > + > Per-lcore and Shared Variables > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > =20 > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/c= ommon/eal_common_memory.c > index d47ea4938..a2e085ae8 100644 > --- a/lib/librte_eal/common/eal_common_memory.c > +++ b/lib/librte_eal/common/eal_common_memory.c > @@ -24,6 +24,7 @@ > #include "eal_memalloc.h" > #include "eal_private.h" > #include "eal_internal_cfg.h" > +#include "malloc_heap.h" > =20 > /* > * Try to mmap *size bytes in /dev/zero. If it is successful, return the > @@ -775,6 +776,79 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms= , size_t *offset) > return ret; > } > =20 > +int __rte_experimental > +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], > + unsigned int n_pages, size_t page_sz) > +{ > + struct rte_mem_config *mcfg =3D rte_eal_get_configuration()->mem_config= ; > + unsigned int socket_id; > + int ret =3D 0; > + > + if (va_addr =3D=3D NULL || page_sz =3D=3D 0 || len =3D=3D 0 || > + !rte_is_power_of_2(page_sz) || > + RTE_ALIGN(len, page_sz) !=3D len) { > + rte_errno =3D EINVAL; > + return -1; > + } Isn't it better to have more sanity check? E.g, (len / page_sz =3D=3D n_pag= es) like rte_malloc_heap_memory_add(). And what about the alignment of va_addr? Shou= ldn't it be page-aligned if I'm not mistaken? rte_malloc_heap_memory_add() doesn'= t have it either... Also you might want to add it to documentation that granularity of these registrations is a page. Otherwise, Acked-by: Yongseok Koh Thanks > + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); > + > + /* make sure the segment doesn't already exist */ > + if (malloc_heap_find_external_seg(va_addr, len) !=3D NULL) { > + rte_errno =3D EEXIST; > + ret =3D -1; > + goto unlock; > + } > + > + /* get next available socket ID */ > + socket_id =3D mcfg->next_socket_id; > + if (socket_id > INT32_MAX) { > + RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n"); > + rte_errno =3D ENOSPC; > + ret =3D -1; > + goto unlock; > + } > + > + /* we can create a new memseg */ > + if (malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages, > + page_sz, "extmem", socket_id) =3D=3D NULL) { > + ret =3D -1; > + goto unlock; > + } > + > + /* memseg list successfully created - increment next socket ID */ > + mcfg->next_socket_id++; > +unlock: > + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); > + return ret; > +} > + > +int __rte_experimental > +rte_extmem_unregister(void *va_addr, size_t len) > +{ > + struct rte_mem_config *mcfg =3D rte_eal_get_configuration()->mem_config= ; > + struct rte_memseg_list *msl; > + int ret =3D 0; > + > + if (va_addr =3D=3D NULL || len =3D=3D 0) { > + rte_errno =3D EINVAL; > + return -1; > + } > + rte_rwlock_write_lock(&mcfg->memory_hotplug_lock); > + > + /* find our segment */ > + msl =3D malloc_heap_find_external_seg(va_addr, len); > + if (msl =3D=3D NULL) { > + rte_errno =3D ENOENT; > + ret =3D -1; > + goto unlock; > + } > + > + ret =3D malloc_heap_destroy_external_seg(msl); > +unlock: > + rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock); > + return ret; > +} > + > /* init memory subsystem */ > int > rte_eal_memory_init(void) > diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/= common/include/rte_memory.h > index d970825df..4a43c1a9e 100644 > --- a/lib/librte_eal/common/include/rte_memory.h > +++ b/lib/librte_eal/common/include/rte_memory.h > @@ -423,6 +423,69 @@ int __rte_experimental > rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms, > size_t *offset); > =20 > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Register external memory chunk with DPDK. > + * > + * @note Using this API is mutually exclusive with ``rte_malloc`` family= of > + * API's. > + * > + * @note This API will not perform any DMA mapping. It is expected that = user > + * will do that themselves. > + * > + * @param va_addr > + * Start of virtual area to register > + * @param len > + * Length of virtual area to register > + * @param iova_addrs > + * Array of page IOVA addresses corresponding to each page in this mem= ory > + * area. Can be NULL, in which case page IOVA addresses will be set to > + * RTE_BAD_IOVA. > + * @param n_pages > + * Number of elements in the iova_addrs array. Ignored if ``iova_addr= s`` > + * is NULL. > + * @param page_sz > + * Page size of the underlying memory > + * > + * @return > + * - 0 on success > + * - -1 in case of error, with rte_errno set to one of the following: > + * EINVAL - one of the parameters was invalid > + * EEXIST - memory chunk is already registered > + * ENOSPC - no more space in internal config to store a new memory c= hunk > + */ > +int __rte_experimental > +rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[], > + unsigned int n_pages, size_t page_sz); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Unregister external memory chunk with DPDK. > + * > + * @note Using this API is mutually exclusive with ``rte_malloc`` family= of > + * API's. > + * > + * @note This API will not perform any DMA unmapping. It is expected tha= t user > + * will do that themselves. > + * > + * @param va_addr > + * Start of virtual area to unregister > + * @param len > + * Length of virtual area to unregister > + * > + * @return > + * - 0 on success > + * - -1 in case of error, with rte_errno set to one of the following: > + * EINVAL - one of the parameters was invalid > + * ENOENT - memory chunk was not found > + */ > +int __rte_experimental > +rte_extmem_unregister(void *va_addr, size_t len); > + > /** > * Dump the physical memory layout to a file. > * > diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_= version.map > index 3fe78260d..593691a14 100644 > --- a/lib/librte_eal/rte_eal_version.map > +++ b/lib/librte_eal/rte_eal_version.map > @@ -296,6 +296,8 @@ EXPERIMENTAL { > rte_devargs_remove; > rte_devargs_type_count; > rte_eal_cleanup; > + rte_extmem_register; > + rte_extmem_unregister; > rte_fbarray_attach; > rte_fbarray_destroy; > rte_fbarray_detach; > --=20 > 2.17.1