From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 363C0A00C5; Thu, 15 Sep 2022 09:40:20 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E0AB14021D; Thu, 15 Sep 2022 09:40:19 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id C844340156 for ; Thu, 15 Sep 2022 09:40:18 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC v2] eal/linux: add support for fast virt/iova translation X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Thu, 15 Sep 2022 09:40:14 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87313@smartserver.smartshare.dk> In-Reply-To: <20220914211201.32940-2-donw@xsightlabs.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC v2] eal/linux: add support for fast virt/iova translation Thread-Index: AdjIfsg20/WTS3AEQeeK2oB7svIujwAUb9Cg References: <20220914211201.32940-1-donw@xsightlabs.com> <20220914211201.32940-2-donw@xsightlabs.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Don Wallwork" , , Cc: , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Don Wallwork [mailto:donw@xsightlabs.com] > Sent: Wednesday, 14 September 2022 23.12 >=20 > This patch maps hugepage memory such that address translation from > virtual to iova or vice versa can be done by simple addition/ > subtraction of a constant value without any page table walks. >=20 > A new '--const-translate' EAL option is added to enable this mode. >=20 > The following example describes how this works: >=20 > Say you have a system with 4 huge pages that are 1G each and the > physical addresses are 10, 11, 17 and 22G. If we map 13G of virtual > address space, that will be enough to cover all of the huge page > physical addresses. >=20 > If the VA region starts at 1G, all of the hugepage PAs can > be mapped into that region as shown below under Proposed > heading. For comparison, existing mapping that would be > done in legacy mode is shown under the Current heading. >=20 > Proposed Current (Legacy) >=20 > VA | PA VA | PA > ----+---- ----+---- > 1G | 10G 1G | 10G > 2G | 11G 2G | 11G > 3G | - 3G | - > 4G | - 4G | 17G > 5G | - 5G | - > 6G | - 6G | 22G > 7G | - > 8G | 17G > 9G | - > 10G | - > 11G | - > 12G | - > 13G | 22G >=20 > So in this example, we have a fixed offset of 9G to translate > between VA to PA or vice versa.This works whether the huge > pages happen to be allocated statically (legacy mode) or > dynamically. >=20 > The unused VA address space from 3G-7G and 9G-12G can be > unmapped in just two unmap calls. >=20 > This patch applies to legacy-mem mode only. > --- [...] > diff --git a/lib/eal/include/rte_memory.h > b/lib/eal/include/rte_memory.h > index 68b069fd04..c87777ca01 100644 > --- a/lib/eal/include/rte_memory.h > +++ b/lib/eal/include/rte_memory.h > @@ -134,6 +134,34 @@ rte_iova_t rte_mem_virt2iova(const void *virt); > void * > rte_mem_iova2virt(rte_iova_t iova); >=20 > +/** > + * Get IO virtual address of any mapped virtual address in the = current > process. > + * > + * @note This function provides a fast implementation of virtual to > physical > + * addresses that does not walk any page tables. Suitable for > use in > + * data plane threads. > + * > + * @param virt > + * The virtual address. > + * @return > + * The IO address or RTE_BAD_IOVA on error. > + */ > +rte_iova_t rte_mem_fast_virt2iova(const void *virt); > + > +/** > + * Get virtual memory address corresponding to iova address. > + * > + * @note This function provides a fast implementation of physical to > virtual to > + * addresses. Suitable for use in data plane threads. > + * > + * @param iova > + * The iova address. > + * @return > + * Virtual address corresponding to iova address (or NULL if = address > does not > + * exist within DPDK memory map). > + */ > +void *rte_mem_fast_iova2virt(rte_iova_t iova); > + > /** > * Get memseg to which a particular virtual address belongs. > * rte_mem_fast_virt2iova() and rte_mem_fast_iova2virt() serve the same = purpose as rte_mem_virt2iova() and rte_mem_iova2virt(). Can't this = alternative algorithm be implemented in the existing functions, or is = there a reason for adding separate functions? Minor detail about names: The DPDK community has a general aversion = against using "fast" in function names - because everything in DPDK is = supposed to be fast. If they need to be separate, perhaps use "_fp" (for = fast path) at the end of their names, like the Trace library's = RTE_TRACE_POINT_FP() variant of the RTE_TRACE_POINT() macro. If these two functions need to be really fast, they could be inline = instead of proper functions. In my opinion, they are short enough to = allow inlining. And, if merged with the original functions, the fast = code path could be inline and fall back to calling the original, slower = functions (which would then need to be renamed). Others might disagree = about inlining. Reusing the existing names might also probably break the = ABI. >From a high level perspective - and referring to my previous argument = about optimizations not being features [1] - I wonder if this patch = should be a build time option instead of a runtime feature? @Don: My preference for build time options is somewhat controversial, so = don't waste time converting to a build time option before the community = has provided feedback on the subject. [1]: = http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D870EB@smartser= ver.smartshare.dk/ [...] > diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c > index c890c42106..3fefb3dc9d 100644 > --- a/lib/eal/linux/eal_memory.c > +++ b/lib/eal/linux/eal_memory.c > @@ -148,6 +148,47 @@ rte_mem_virt2iova(const void *virtaddr) > return rte_mem_virt2phy(virtaddr); > } >=20 > +static void *const_va_pa_delta; This variable is primarily used as an address offset, not a pointer, so = consider making it "uintptr_t" instead of "void *". > + > +#ifdef RTE_MEM_SANITY_CHECK > +#define __rte_mem_validate(v) rte_mem_validate(v) > + > +static int rte_mem_validate(const void *virtaddr) > +{ > + if (!rte_mem_virt2memseg(virt, NULL)) { > + RTE_LOG(ERR, EAL, "Invalid virtual address %p\n", > virtaddr); > + return -1; > + } > + return 0; > +} > +#else > +#define __rte_mem_validate(v) 0 > +#endif > + > +rte_iova_t rte_mem_fast_virt2iova(const void *virtaddr) > +{ > + if (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) > + return (uintptr_t)virtaddr; > + > + if (__rte_mem_validate(virtaddr) !=3D 0) > + return RTE_BAD_IOVA; > + > + return (rte_iova_t)((uintptr_t)virtaddr - > (uintptr_t)const_va_pa_delta); You can probably use RTE_PTR_SUB() instead. > +} > + > +void *rte_mem_fast_iova2virt(rte_iova_t iova) > +{ > + if (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) > + return (void *)(uintptr_t)iova; > + > + void *virtaddr =3D (void *)((uintptr_t)const_va_pa_delta + iova); You can probably use RTE_PTR_ADD() instead. > + > + if (__rte_mem_validate(virtaddr) !=3D 0) > + return NULL; > + > + return virtaddr; > +} > +