From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Don Wallwork" <donw@xsightlabs.com>, <dev@dpdk.org>,
<bruce.richardson@intel.com>
Cc: <dmitry.kozliuk@gmail.com>, <anatoly.burakov@intel.com>
Subject: RE: [RFC v2] eal/linux: add support for fast virt/iova translation
Date: Thu, 15 Sep 2022 09:40:14 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87313@smartserver.smartshare.dk> (raw)
In-Reply-To: <20220914211201.32940-2-donw@xsightlabs.com>
> From: Don Wallwork [mailto:donw@xsightlabs.com]
> Sent: Wednesday, 14 September 2022 23.12
>
> This patch maps hugepage memory such that address translation from
> virtual to iova or vice versa can be done by simple addition/
> subtraction of a constant value without any page table walks.
>
> A new '--const-translate' EAL option is added to enable this mode.
>
> The following example describes how this works:
>
> Say you have a system with 4 huge pages that are 1G each and the
> physical addresses are 10, 11, 17 and 22G. If we map 13G of virtual
> address space, that will be enough to cover all of the huge page
> physical addresses.
>
> If the VA region starts at 1G, all of the hugepage PAs can
> be mapped into that region as shown below under Proposed
> heading. For comparison, existing mapping that would be
> done in legacy mode is shown under the Current heading.
>
> Proposed Current (Legacy)
>
> VA | PA VA | PA
> ----+---- ----+----
> 1G | 10G 1G | 10G
> 2G | 11G 2G | 11G
> 3G | - 3G | -
> 4G | - 4G | 17G
> 5G | - 5G | -
> 6G | - 6G | 22G
> 7G | -
> 8G | 17G
> 9G | -
> 10G | -
> 11G | -
> 12G | -
> 13G | 22G
>
> So in this example, we have a fixed offset of 9G to translate
> between VA to PA or vice versa.This works whether the huge
> pages happen to be allocated statically (legacy mode) or
> dynamically.
>
> The unused VA address space from 3G-7G and 9G-12G can be
> unmapped in just two unmap calls.
>
> This patch applies to legacy-mem mode only.
> ---
[...]
> diff --git a/lib/eal/include/rte_memory.h
> b/lib/eal/include/rte_memory.h
> index 68b069fd04..c87777ca01 100644
> --- a/lib/eal/include/rte_memory.h
> +++ b/lib/eal/include/rte_memory.h
> @@ -134,6 +134,34 @@ rte_iova_t rte_mem_virt2iova(const void *virt);
> void *
> rte_mem_iova2virt(rte_iova_t iova);
>
> +/**
> + * Get IO virtual address of any mapped virtual address in the current
> process.
> + *
> + * @note This function provides a fast implementation of virtual to
> physical
> + * addresses that does not walk any page tables. Suitable for
> use in
> + * data plane threads.
> + *
> + * @param virt
> + * The virtual address.
> + * @return
> + * The IO address or RTE_BAD_IOVA on error.
> + */
> +rte_iova_t rte_mem_fast_virt2iova(const void *virt);
> +
> +/**
> + * Get virtual memory address corresponding to iova address.
> + *
> + * @note This function provides a fast implementation of physical to
> virtual to
> + * addresses. Suitable for use in data plane threads.
> + *
> + * @param iova
> + * The iova address.
> + * @return
> + * Virtual address corresponding to iova address (or NULL if address
> does not
> + * exist within DPDK memory map).
> + */
> +void *rte_mem_fast_iova2virt(rte_iova_t iova);
> +
> /**
> * Get memseg to which a particular virtual address belongs.
> *
rte_mem_fast_virt2iova() and rte_mem_fast_iova2virt() serve the same purpose as rte_mem_virt2iova() and rte_mem_iova2virt(). Can't this alternative algorithm be implemented in the existing functions, or is there a reason for adding separate functions?
Minor detail about names: The DPDK community has a general aversion against using "fast" in function names - because everything in DPDK is supposed to be fast. If they need to be separate, perhaps use "_fp" (for fast path) at the end of their names, like the Trace library's RTE_TRACE_POINT_FP() variant of the RTE_TRACE_POINT() macro.
If these two functions need to be really fast, they could be inline instead of proper functions. In my opinion, they are short enough to allow inlining. And, if merged with the original functions, the fast code path could be inline and fall back to calling the original, slower functions (which would then need to be renamed). Others might disagree about inlining. Reusing the existing names might also probably break the ABI.
From a high level perspective - and referring to my previous argument about optimizations not being features [1] - I wonder if this patch should be a build time option instead of a runtime feature?
@Don: My preference for build time options is somewhat controversial, so don't waste time converting to a build time option before the community has provided feedback on the subject.
[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D870EB@smartserver.smartshare.dk/
[...]
> diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
> index c890c42106..3fefb3dc9d 100644
> --- a/lib/eal/linux/eal_memory.c
> +++ b/lib/eal/linux/eal_memory.c
> @@ -148,6 +148,47 @@ rte_mem_virt2iova(const void *virtaddr)
> return rte_mem_virt2phy(virtaddr);
> }
>
> +static void *const_va_pa_delta;
This variable is primarily used as an address offset, not a pointer, so consider making it "uintptr_t" instead of "void *".
> +
> +#ifdef RTE_MEM_SANITY_CHECK
> +#define __rte_mem_validate(v) rte_mem_validate(v)
> +
> +static int rte_mem_validate(const void *virtaddr)
> +{
> + if (!rte_mem_virt2memseg(virt, NULL)) {
> + RTE_LOG(ERR, EAL, "Invalid virtual address %p\n",
> virtaddr);
> + return -1;
> + }
> + return 0;
> +}
> +#else
> +#define __rte_mem_validate(v) 0
> +#endif
> +
> +rte_iova_t rte_mem_fast_virt2iova(const void *virtaddr)
> +{
> + if (rte_eal_iova_mode() == RTE_IOVA_VA)
> + return (uintptr_t)virtaddr;
> +
> + if (__rte_mem_validate(virtaddr) != 0)
> + return RTE_BAD_IOVA;
> +
> + return (rte_iova_t)((uintptr_t)virtaddr -
> (uintptr_t)const_va_pa_delta);
You can probably use RTE_PTR_SUB() instead.
> +}
> +
> +void *rte_mem_fast_iova2virt(rte_iova_t iova)
> +{
> + if (rte_eal_iova_mode() == RTE_IOVA_VA)
> + return (void *)(uintptr_t)iova;
> +
> + void *virtaddr = (void *)((uintptr_t)const_va_pa_delta + iova);
You can probably use RTE_PTR_ADD() instead.
> +
> + if (__rte_mem_validate(virtaddr) != 0)
> + return NULL;
> +
> + return virtaddr;
> +}
> +
next prev parent reply other threads:[~2022-09-15 7:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-14 21:12 Don Wallwork
2022-09-14 21:12 ` Don Wallwork
2022-09-15 7:40 ` Morten Brørup [this message]
2022-09-15 12:13 ` Don Wallwork
2022-09-15 12:49 ` Morten Brørup
2022-09-16 19:01 ` Don Wallwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87313@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=dmitry.kozliuk@gmail.com \
--cc=donw@xsightlabs.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).