DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Walker, Benjamin" <benjamin.walker@intel.com>
To: "Tan, Jianfeng" <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Running DPDK as an unprivileged user
Date: Wed, 4 Jan 2017 21:34:26 +0000	[thread overview]
Message-ID: <1483565664.9482.3.camel@intel.com> (raw)
In-Reply-To: <685186b4-e50e-c122-459b-e4635404c3f8@intel.com>

On Wed, 2017-01-04 at 19:39 +0800, Tan, Jianfeng wrote:
> Hi Benjamin,
> 
> 
> On 12/30/2016 4:41 AM, Walker, Benjamin wrote:
> > DPDK today begins by allocating all of the required
> > hugepages, then finds all of the physical addresses for
> > those hugepages using /proc/self/pagemap, sorts the
> > hugepages by physical address, then remaps the pages to
> > contiguous virtual addresses. Later on and if vfio is
> > enabled, it asks vfio to pin the hugepages and to set their
> > DMA addresses in the IOMMU to be the physical addresses
> > discovered earlier. Of course, running as an unprivileged
> > user means all of the physical addresses in
> > /proc/self/pagemap are just 0, so this doesn't end up
> > working. Further, there is no real reason to choose the
> > physical address as the DMA address in the IOMMU - it would
> > be better to just count up starting at 0.
> 
> Why not just using virtual address as the DMA address in this case to 
> avoid maintaining another kind of addresses?

That's a valid choice, although I'm just storing the DMA address in the
physical address field that already exists. You either have a physical
address or a DMA address and never both.

> 
> >   Also, because the
> > pages are pinned after the virtual to physical mapping is
> > looked up, there is a window where a page could be moved.
> > Hugepage mappings can be moved on more recent kernels (at
> > least 4.x), and the reliability of hugepages having static
> > mappings decreases with every kernel release.
> 
> Do you mean kernel might take back a physical page after mapping it to a 
> virtual page (maybe copy the data to another physical page)? Could you 
> please show some links or kernel commits?

Yes - the kernel can move a physical page to another physical page
and change the virtual mapping at any time. For a concise example
see 'man migrate_pages(2)', or for a more serious example the code
that performs memory page compaction in the kernel which was
recently extended to support hugepages.

Before we go down the path of me proving that the mapping isn't static,
let me turn that line of thinking around. Do you have any documentation
demonstrating that the mapping is static? It's not static for 4k pages, so
why are we assuming that it is static for 2MB pages? I understand that
it happened to be static for some versions of the kernel, but my understanding
is that this was purely by coincidence and never by intention.

> 
> > Note that this
> > probably means that using uio on recent kernels is subtly
> > broken and cannot be supported going forward because there
> > is no uio mechanism to pin the memory.
> > 
> > The first open question I have is whether DPDK should allow
> > uio at all on recent (4.x) kernels. My current understanding
> > is that there is no way to pin memory and hugepages can now
> > be moved around, so uio would be unsafe. What does the
> > community think here?
> > 
> > My second question is whether the user should be allowed to
> > mix uio and vfio usage simultaneously. For vfio, the
> > physical addresses are really DMA addresses and are best
> > when arbitrarily chosen to appear sequential relative to
> > their virtual addresses.
> 
> Why "sequential relative to their virtual addresses"? IOMMU table is for 
> DMA addr -> physical addr mapping. So we need to DMA addresses 
> "sequential relative to their physical addresses"? Based on your above 
> analysis on how hugepages are initialized, virtual addresses is a good 
> candidate for DMA address?

The code already goes through a separate organizational step on all of
the pages that remaps the virtual addresses such that they're sequential
relative to the physical backing pages, so this mostly ends up as the same
thing.
Choosing to use the virtual address is a totally valid choice, but I worry it
may lead to confusion during debugging or in a multi-process scenario.
I'm open to making this choice instead of starting from zero, though.

> 
> Thanks,
> Jianfeng

  reply	other threads:[~2017-01-04 21:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-29 20:41 Walker, Benjamin
2016-12-30  1:14 ` Stephen Hemminger
2017-01-02 14:32   ` Thomas Monjalon
2017-01-02 19:47     ` Stephen Hemminger
2017-01-03 22:50       ` Walker, Benjamin
2017-01-04 10:11         ` Thomas Monjalon
2017-01-04 21:35           ` Walker, Benjamin
2017-01-04 11:39 ` Tan, Jianfeng
2017-01-04 21:34   ` Walker, Benjamin [this message]
2017-01-05 10:09     ` Sergio Gonzalez Monroy
2017-01-05 10:16       ` Sergio Gonzalez Monroy
2017-01-05 14:58         ` Tan, Jianfeng
2017-01-05 15:52     ` Tan, Jianfeng
2017-11-05  0:17       ` Thomas Monjalon
2017-11-27 17:58         ` Walker, Benjamin
2017-11-28 14:16           ` Alejandro Lucero
2017-11-28 17:50             ` Walker, Benjamin
2017-11-28 19:13               ` Alejandro Lucero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1483565664.9482.3.camel@intel.com \
    --to=benjamin.walker@intel.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).