DPDK patches and discussions
 help / color / mirror / Atom feed
From: Sergio Gonzalez Monroy <sergio.gonzalez.monroy@intel.com>
To: "Walker, Benjamin" <benjamin.walker@intel.com>,
	"Tan, Jianfeng" <jianfeng.tan@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Running DPDK as an unprivileged user
Date: Thu, 5 Jan 2017 10:16:53 +0000	[thread overview]
Message-ID: <e554e14f-6e3a-7d12-2df0-690a6a06df80@intel.com> (raw)
In-Reply-To: <d3ec1c98-a394-00c5-36a8-6ec9d839b65c@intel.com>

On 05/01/2017 10:09, Sergio Gonzalez Monroy wrote:
> On 04/01/2017 21:34, Walker, Benjamin wrote:
>> On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
>>> Hi Benjamin,
>>>
>>>
>>> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
>>>> DPDK today begins by allocating all of the required
>>>> hugepages, then finds all of the physical addresses for
>>>> those hugepages using /proc/self/pagemap, sorts the
>>>> hugepages by physical address, then remaps the pages to
>>>> contiguous virtual addresses. Later on and if vfio is
>>>> enabled, it asks vfio to pin the hugepages and to set their
>>>> DMA addresses in the IOMMU to be the physical addresses
>>>> discovered earlier. Of course, running as an unprivileged
>>>> user means all of the physical addresses in
>>>> /proc/self/pagemap are just 0, so this doesn't end up
>>>> working. Further, there is no real reason to choose the
>>>> physical address as the DMA address in the IOMMU - it would
>>>> be better to just count up starting at 0.
>>> Why not just using virtual address as the DMA address in this case to
>>> avoid maintaining another kind of addresses?
>> That's a valid choice, although I'm just storing the DMA address in the
>> physical address field that already exists. You either have a physical
>> address or a DMA address and never both.
>>
>>>>    Also, because the
>>>> pages are pinned after the virtual to physical mapping is
>>>> looked up, there is a window where a page could be moved.
>>>> Hugepage mappings can be moved on more recent kernels (at
>>>> least 4.x), and the reliability of hugepages having static
>>>> mappings decreases with every kernel release.
>>> Do you mean kernel might take back a physical page after mapping it 
>>> to a
>>> virtual page (maybe copy the data to another physical page)? Could you
>>> please show some links or kernel commits?
>> Yes - the kernel can move a physical page to another physical page
>> and change the virtual mapping at any time. For a concise example
>> see 'man migrate_pages(2)', or for a more serious example the code
>> that performs memory page compaction in the kernel which was
>> recently extended to support hugepages.
>>
>> Before we go down the path of me proving that the mapping isn't static,
>> let me turn that line of thinking around. Do you have any documentation
>> demonstrating that the mapping is static? It's not static for 4k 
>> pages, so
>> why are we assuming that it is static for 2MB pages? I understand that
>> it happened to be static for some versions of the kernel, but my 
>> understanding
>> is that this was purely by coincidence and never by intention.
>
> It looks to me as if you are talking about Transparent hugepages, and 
> not hugetlbfs managed hugepages (DPDK usecase).
> AFAIK memory (hugepages) managed by hugetlbfs is not compacted and/or 
> moved, they are not part of the kernel memory management.
>

Please forgive my loose/poor use of words here when saying that "they 
are not part of the kernel memory management", I mean to say that
they are not part of the kernel memory management process you were 
mentioning, ie. compacting, moving, etc.

Sergio

> So again, do you have some references to code/articles where this 
> "dynamic" behavior of hugepages managed by hugetlbfs is mentioned?
>
> Sergio
>
>>>> Note that this
>>>> probably means that using uio on recent kernels is subtly
>>>> broken and cannot be supported going forward because there
>>>> is no uio mechanism to pin the memory.
>>>>
>>>> The first open question I have is whether DPDK should allow
>>>> uio at all on recent (4.x) kernels. My current understanding
>>>> is that there is no way to pin memory and hugepages can now
>>>> be moved around, so uio would be unsafe. What does the
>>>> community think here?
>>>>
>>>> My second question is whether the user should be allowed to
>>>> mix uio and vfio usage simultaneously. For vfio, the
>>>> physical addresses are really DMA addresses and are best
>>>> when arbitrarily chosen to appear sequential relative to
>>>> their virtual addresses.
>>> Why "sequential relative to their virtual addresses"? IOMMU table is 
>>> for
>>> DMA addr -> physical addr mapping. So we need to DMA addresses
>>> "sequential relative to their physical addresses"? Based on your above
>>> analysis on how hugepages are initialized, virtual addresses is a good
>>> candidate for DMA address?
>> The code already goes through a separate organizational step on all of
>> the pages that remaps the virtual addresses such that they're sequential
>> relative to the physical backing pages, so this mostly ends up as the 
>> same
>> thing.
>> Choosing to use the virtual address is a totally valid choice, but I 
>> worry it
>> may lead to confusion during debugging or in a multi-process scenario.
>> I'm open to making this choice instead of starting from zero, though.
>>
>>> Thanks,
>>> Jianfeng
>
>

  reply	other threads:[~2017-01-05 10:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-29 20:41 Walker, Benjamin
2016-12-30  1:14 ` Stephen Hemminger
2017-01-02 14:32   ` Thomas Monjalon
2017-01-02 19:47     ` Stephen Hemminger
2017-01-03 22:50       ` Walker, Benjamin
2017-01-04 10:11         ` Thomas Monjalon
2017-01-04 21:35           ` Walker, Benjamin
2017-01-04 11:39 ` Tan, Jianfeng
2017-01-04 21:34   ` Walker, Benjamin
2017-01-05 10:09     ` Sergio Gonzalez Monroy
2017-01-05 10:16       ` Sergio Gonzalez Monroy [this message]
2017-01-05 14:58         ` Tan, Jianfeng
2017-01-05 15:52     ` Tan, Jianfeng
2017-11-05  0:17       ` Thomas Monjalon
2017-11-27 17:58         ` Walker, Benjamin
2017-11-28 14:16           ` Alejandro Lucero
2017-11-28 17:50             ` Walker, Benjamin
2017-11-28 19:13               ` Alejandro Lucero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e554e14f-6e3a-7d12-2df0-690a6a06df80@intel.com \
    --to=sergio.gonzalez.monroy@intel.com \
    --cc=benjamin.walker@intel.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).