DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Walker, Benjamin" <benjamin.walker@intel.com>
To: "david.marchand@redhat.com" <david.marchand@redhat.com>,
	"jerinj@marvell.com" <jerinj@marvell.com>
Cc: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
Date: Mon, 3 Jun 2019 16:44:25 +0000	[thread overview]
Message-ID: <da23632dca896ac78b998c85840b974a2d4d206d.camel@intel.com> (raw)
In-Reply-To: <CAJFAV8yijcKH3fA9QFM_GJmWkhrKc0TqOtydYeV6FWnbE3mD2Q@mail.gmail.com>

On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> Hello, 
> 
> On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker@intel.com> wrote:
> > In SPDK, not all drivers are registered with DPDK at start up time.
> > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > mode. Instead, when the correct iova choice is unclear based on the
> > devices and drivers known to DPDK at start up time, use other heuristics
> > (such as whether /proc/self/pagemap is accessible) to make a better
> > choice.
> > 
> > This enables SPDK to run as an unprivileged user again without requiring
> > users to explicitly set the iova mode on the command line.
> > 
> 
> Interesting, I got a bz on something similar the day you sent this patchset ;-
> )
> 
> 
> - When a dpdk process is started, either it has access to physical addresses
> or not, and this won't change for the rest of its life.
> Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check makes
> sense to me.
> It is the most encountered situation when running ovs as non root on recent
> kernels.
> 
> 
> - However, I fail to see the need for all of this detection code wrt drivers
> and devices.
> 
> On one side of the equation, when dpdk starts, it checks physical address
> availability.
> On the other side of the equation, we have the drivers that will be invoked
> when probing devices (either at dpdk init, or when hotplugging a device).
> 
> At this point, the probing call should check the driver requirement wrt to the
> kernel driver the device is attached to.
> If this requirement is not fulfilled, then the probing fails.
> 
> 
> - This leaves the --iova-va forcing option. 
> Why do we need it?
> If we don't have access to physical addresses, no choice but run in VA mode.
> If we have access to physical addresses, the only case would be that you want
> to downgrade from PA to VA.
> But well, your process can still access it, not sure what the benefit is.

All of the complexity here, at least as far as I understand it, stems from
supporting hot insert of devices. This is very important to SPDK because storage
devices get hot inserted all the time, so we very much appreciate that DPDK has
put in so much effort in this area and continues to accept our patches to
improve it. I know hot insert is not nearly as important for network devices.

When DPDK starts up, it needs to select whether to use virtual addresses or
physical addresses in its memory maps. It can do that by answering the following
questions:

1. Does the system only have buses that support an IOMMU?
2. Is the IOMMU sufficiently fast for the use case?
3. Will all of the devices that will be used with DPDK throughout the
application's lifetime work with an IOMMU?

If these three things are true, then the best choice is to use virtual addresses
in the memory translations. However, if any of the above are not true it needs
to fall back to physical addresses.

#1 is checked by simply asking all of the buses, which are known up front. #2 is
just assumed to be true. But #3 is not possible to check fully because of hot
insert.

The code currently approximates the #3 check by looking at the devices present
at initialization time. If a device exists that's bound to vfio-pci, and no
other devices exist that are bound to a uio driver, and DPDK has a registered
driver that's actually going to load against the vfio-pci devices, then it will
elect to use virtual addresses. This is purely a heuristic - it's not a
definitive answer because the user could later hot insert a device that gets
bound to uio.

The user, of course, knows the answer to which addressing scheme to use
typically. For example, these checks assume #2 is true, but there may be
hardware implementations where it is not and the user wants to force physical
addresses. Or the user may know that they are going to hot insert a device at
run time that doesn't work with the IOMMU. That's why it's important to maintain
the ability for the user to override the default heuristic's decision via the
command line.

My patch series is simply improving the heuristic in a few ways. First,
previously each bus when queried would return either virtual or physical
addresses as its choice. However, often the bus just does not have enough
information to formulate any preference at all (and PCI was defaulting to
physical addresses in this case). Instead, I made it so that the bus can return
that it doesn't care, which pushes the decision up to a higher level. That
higher level then makes the decision by checking whether it can access
/proc/self/pagemap. Second, I narrowed the uio check such that physical
addresses will only be selected if a device bound to uio exists and there is a
driver registered to use it. Previously if any device was bound to uio it would
select physical addresses, even if DPDK never ended up loading against that
device.

I think these two things make the heuristic choose the right thing more often,
but it still won't always get it right so the command line option needs to
remain.

Thanks,
Ben

> 
> 
> Jerin, I can see in the history you worked on this.
> What did I miss?
> Is there something wrong with dropping the detection code?
> 
> 
> 
> -- 
> David Marchand


  reply	other threads:[~2019-06-03 16:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30 17:48 Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 01/12] eal: Make rte_eal_using_phys_addrs work sooner Ben Walker
2019-05-30 21:29   ` [dpdk-dev] [PATCH v2 " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 04/12] eal/pci: Collapse two " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 05/12] eal/pci: Add function pci_ignore_device Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 07/12] eal/pci: Reverse if check " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 08/12] eal/pci: Collapse loops " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 10/12] eal/pci: Finding a device bound to UIO does not force PA Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 12/12] eal: If bus can't decide PA or VA, try to access PA Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class Ben Walker
2019-05-30 17:57   ` Stephen Hemminger
2019-05-30 18:09     ` Walker, Benjamin
2019-05-30 17:48 ` [dpdk-dev] [PATCH 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 04/12] eal/pci: Collapse two " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 05/12] eal/pci: Add function pci_ignore_device Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 07/12] eal/pci: Reverse if check " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 08/12] eal/pci: Collapse loops " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 10/12] eal/pci: Finding a device bound to UIO does not force PA Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 12/12] eal: If bus can't decide PA or VA, try to access PA Ben Walker
2019-06-03 10:48 ` [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode David Marchand
2019-06-03 16:44   ` Walker, Benjamin [this message]
2019-06-14  8:42     ` David Marchand
2019-06-14  9:39 ` [dpdk-dev] [PATCH v2 0/3] " David Marchand
2019-06-14  9:39   ` [dpdk-dev] [PATCH v2 1/3] kni: refuse to initialise when IOVA is not PA David Marchand
2019-06-14  9:39   ` [dpdk-dev] [PATCH v2 2/3] eal: compute IOVA mode based on PA availability David Marchand
2019-07-03 10:17     ` Burakov, Anatoly
2019-07-04  7:13       ` David Marchand
2019-06-14  9:39   ` [dpdk-dev] [PATCH v2 3/3] bus/pci: only consider usable devices to select IOVA mode David Marchand
2019-07-03 10:45     ` Burakov, Anatoly
2019-07-04  9:18       ` David Marchand
2019-07-04 10:43         ` Burakov, Anatoly
2019-07-04 10:47           ` David Marchand
2019-07-04 17:14     ` Stephen Hemminger
2019-07-05  7:58       ` David Marchand
2019-07-05 16:27         ` Stephen Hemminger
2019-07-05  8:26       ` Thomas Monjalon
2019-06-27 17:05   ` [dpdk-dev] [PATCH v2 0/3] Improve automatic selection of " Thomas Monjalon
2019-07-02 14:18     ` Thomas Monjalon
2019-07-05 14:57   ` Thomas Monjalon
2019-06-04 11:28 [dpdk-dev] eal/pci: " Jerin Jacob Kollanukkaran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da23632dca896ac78b998c85840b974a2d4d206d.camel@intel.com \
    --to=benjamin.walker@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).