DPDK patches and discussions
 help / color / mirror / Atom feed
From: Don Wallwork <donw@xsightlabs.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	dev@dpdk.org, bruce.richardson@intel.com
Cc: dmitry.kozliuk@gmail.com, anatoly.burakov@intel.com
Subject: Re: [RFC v2] eal/linux: add support for fast virt/iova translation
Date: Thu, 15 Sep 2022 08:13:43 -0400	[thread overview]
Message-ID: <485837b0-9e0f-f023-7dfc-170bddda26e4@xsightlabs.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87313@smartserver.smartshare.dk>

On 9/15/2022 3:40 AM, Morten Brørup wrote:
>> From: Don Wallwork [mailto:donw@xsightlabs.com]
>> Sent: Wednesday, 14 September 2022 23.12
>>
>> This patch maps hugepage memory such that address translation from
>> virtual to iova or vice versa can be done by simple addition/
>> subtraction of a constant value without any page table walks.
>>
>> A new '--const-translate' EAL option is added to enable this mode.
[...]
>> diff --git a/lib/eal/include/rte_memory.h
>> b/lib/eal/include/rte_memory.h
>> index 68b069fd04..c87777ca01 100644
>> --- a/lib/eal/include/rte_memory.h
>> +++ b/lib/eal/include/rte_memory.h
>> @@ -134,6 +134,34 @@ rte_iova_t rte_mem_virt2iova(const void *virt);
>>   void *
>>   rte_mem_iova2virt(rte_iova_t iova);
>>
>> +/**
>> + * Get IO virtual address of any mapped virtual address in the current
>> process.
>> + *
>> + * @note This function provides a fast implementation of virtual to
>> physical
>> + *       addresses that does not walk any page tables.  Suitable for
>> use in
>> + *       data plane threads.
>> + *
>> + * @param virt
>> + *   The virtual address.
>> + * @return
>> + *   The IO address or RTE_BAD_IOVA on error.
>> + */
>> +rte_iova_t rte_mem_fast_virt2iova(const void *virt);
>> +
>> +/**
>> + * Get virtual memory address corresponding to iova address.
>> + *
>> + * @note This function provides a fast implementation of physical to
>> virtual to
>> + *       addresses.  Suitable for use in data plane threads.
>> + *
>> + * @param iova
>> + *   The iova address.
>> + * @return
>> + *   Virtual address corresponding to iova address (or NULL if address
>> does not
>> + *   exist within DPDK memory map).
>> + */
>> +void *rte_mem_fast_iova2virt(rte_iova_t iova);
>> +
>>   /**
>>    * Get memseg to which a particular virtual address belongs.
>>    *
> rte_mem_fast_virt2iova() and rte_mem_fast_iova2virt() serve the same purpose as rte_mem_virt2iova() and rte_mem_iova2virt(). Can't this alternative algorithm be implemented in the existing functions, or is there a reason for adding separate functions?
It is necessary to keep a version of the translation functions that does 
the page table walks since that method must be used at init time.  My 
intention was to add new functions that are safe for use in fast path code.
>
> Minor detail about names: The DPDK community has a general aversion against using "fast" in function names - because everything in DPDK is supposed to be fast. If they need to be separate, perhaps use "_fp" (for fast path) at the end of their names, like the Trace library's RTE_TRACE_POINT_FP() variant of the RTE_TRACE_POINT() macro.
Thought that might be the case.. I can change to _fp.
>
> If these two functions need to be really fast, they could be inline instead of proper functions. In my opinion, they are short enough to allow inlining. And, if merged with the original functions, the fast code path could be inline and fall back to calling the original, slower functions (which would then need to be renamed). Others might disagree about inlining. Reusing the existing names might also probably break the ABI.
Agree that this could be inlined, but probably best not to merge with 
the page walk version since it would not be desirable for fast path code 
to fall back to that mode.  Also since this patch implements support for 
Linux EAL only, the inline functions may need to be added to a Linux 
specific header.
>
>  From a high level perspective - and referring to my previous argument about optimizations not being features [1] - I wonder if this patch should be a build time option instead of a runtime feature?
> @Don: My preference for build time options is somewhat controversial, so don't waste time converting to a build time option before the community has provided feedback on the subject.
>
> [1]: https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Finbox.dpdk.org%2Fdev%2F98CBD80474FA8B44BF855DF32C47DC35D870EB%40smartserver.smartshare.dk%2F&amp;data=05%7C01%7Cdonw%40xsightlabs.com%7C27c0102845d042a17bb408da96ed96f7%7C646a3e3483ea42739177ab01923abaa9%7C0%7C0%7C637988244443228274%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=bW9ZmyZ9Ap9rAG5juYGEUgvCNTCJ96Q1IwPf8ExaEGI%3D&amp;reserved=0
>
> [...]
>
>> diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
>> index c890c42106..3fefb3dc9d 100644
>> --- a/lib/eal/linux/eal_memory.c
>> +++ b/lib/eal/linux/eal_memory.c
>> @@ -148,6 +148,47 @@ rte_mem_virt2iova(const void *virtaddr)
>>   	return rte_mem_virt2phy(virtaddr);
>>   }
>>
>> +static void *const_va_pa_delta;
> This variable is primarily used as an address offset, not a pointer, so consider making it "uintptr_t" instead of "void *".
Good point.
>> +
>> +#ifdef RTE_MEM_SANITY_CHECK
>> +#define __rte_mem_validate(v) rte_mem_validate(v)
>> +
>> +static int rte_mem_validate(const void *virtaddr)
>> +{
>> +	if (!rte_mem_virt2memseg(virt, NULL)) {
>> +		RTE_LOG(ERR, EAL, "Invalid virtual address %p\n",
>> virtaddr);
>> +		return -1;
>> +	}
>> +	return 0;
>> +}
>> +#else
>> +#define __rte_mem_validate(v) 0
>> +#endif
>> +
>> +rte_iova_t rte_mem_fast_virt2iova(const void *virtaddr)
>> +{
>> +	if (rte_eal_iova_mode() == RTE_IOVA_VA)
>> +		return (uintptr_t)virtaddr;
>> +
>> +	if (__rte_mem_validate(virtaddr) != 0)
>> +		return RTE_BAD_IOVA;
>> +
>> +	return (rte_iova_t)((uintptr_t)virtaddr -
>> (uintptr_t)const_va_pa_delta);
> You can probably use RTE_PTR_SUB() instead.
Will do.
>
>> +}
>> +
>> +void *rte_mem_fast_iova2virt(rte_iova_t iova)
>> +{
>> +	if (rte_eal_iova_mode() == RTE_IOVA_VA)
>> +		return (void *)(uintptr_t)iova;
>> +
>> +	void *virtaddr = (void *)((uintptr_t)const_va_pa_delta + iova);
> You can probably use RTE_PTR_ADD() instead.
Agreed.

-Don
>
>> +
>> +	if (__rte_mem_validate(virtaddr) != 0)
>> +		return NULL;
>> +
>> +	return virtaddr;
>> +}
>> +


  reply	other threads:[~2022-09-15 12:13 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-14 21:12 Don Wallwork
2022-09-14 21:12 ` Don Wallwork
2022-09-15  7:40   ` Morten Brørup
2022-09-15 12:13     ` Don Wallwork [this message]
2022-09-15 12:49       ` Morten Brørup
2022-09-16 19:01   ` Don Wallwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=485837b0-9e0f-f023-7dfc-170bddda26e4@xsightlabs.com \
    --to=donw@xsightlabs.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).