DPDK patches and discussions
 help / color / mirror / Atom feed
* Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
@ 2019-06-04 11:28 Jerin Jacob Kollanukkaran
  0 siblings, 0 replies; 5+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-04 11:28 UTC (permalink / raw)
  To: David Marchand, Ben Walker; +Cc: dev, Burakov, Anatoly, eric.zhang


From: David Marchand <david.marchand@redhat.com> 
Sent: Monday, June 3, 2019 4:19 PM
To: Ben Walker <benjamin.walker@intel.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Cc: dev <dev@dpdk.org>; Burakov, Anatoly <anatoly.burakov@intel.com>
Subject: Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode


> - This leaves the --iova-va forcing option. 
> Why do we need it?
> If we don't have access to physical addresses, no choice but run in VA mode.
> If we have access to physical addresses, the only case would be that you want to downgrade from PA to VA.
> But well, your process can still access it, not sure what the benefit is.


> Jerin, I can see in the history you worked on this.
> What did I miss?
> Is there something wrong with dropping the detection code?

Its added by Eric to support virtual devices.

https://mails.dpdk.org/archives/dev/2018-September/111141.html




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
  2019-06-03 16:44   ` Walker, Benjamin
@ 2019-06-14  8:42     ` David Marchand
  0 siblings, 0 replies; 5+ messages in thread
From: David Marchand @ 2019-06-14  8:42 UTC (permalink / raw)
  To: Walker, Benjamin
  Cc: jerinj, Burakov, Anatoly, dev, Maxime Coquelin, hemant.agrawal,
	shreyansh.jain, Rosen Xu, Gaetan Rivet, Stephen Hemminger, Yigit,
	Ferruh, Thomas Monjalon, Yongseok Koh

On Mon, Jun 3, 2019 at 6:44 PM Walker, Benjamin <benjamin.walker@intel.com>
wrote:

> On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> > Hello,
> >
> > On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker@intel.com>
> wrote:
> > > In SPDK, not all drivers are registered with DPDK at start up time.
> > > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > > mode. Instead, when the correct iova choice is unclear based on the
> > > devices and drivers known to DPDK at start up time, use other
> heuristics
> > > (such as whether /proc/self/pagemap is accessible) to make a better
> > > choice.
> > >
> > > This enables SPDK to run as an unprivileged user again without
> requiring
> > > users to explicitly set the iova mode on the command line.
> > >
> >
> > Interesting, I got a bz on something similar the day you sent this
> patchset ;-
> > )
> >
> >
> > - When a dpdk process is started, either it has access to physical
> addresses
> > or not, and this won't change for the rest of its life.
> > Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check
> makes
> > sense to me.
> > It is the most encountered situation when running ovs as non root on
> recent
> > kernels.
> >
> >
> > - However, I fail to see the need for all of this detection code wrt
> drivers
> > and devices.
> >
> > On one side of the equation, when dpdk starts, it checks physical address
> > availability.
> > On the other side of the equation, we have the drivers that will be
> invoked
> > when probing devices (either at dpdk init, or when hotplugging a device).
> >
> > At this point, the probing call should check the driver requirement wrt
> to the
> > kernel driver the device is attached to.
> > If this requirement is not fulfilled, then the probing fails.
> >
> >
> > - This leaves the --iova-va forcing option.
> > Why do we need it?
> > If we don't have access to physical addresses, no choice but run in VA
> mode.
> > If we have access to physical addresses, the only case would be that you
> want
> > to downgrade from PA to VA.
> > But well, your process can still access it, not sure what the benefit is.
>
> All of the complexity here, at least as far as I understand it, stems from
> supporting hot insert of devices. This is very important to SPDK because
> storage
> devices get hot inserted all the time, so we very much appreciate that
> DPDK has
> put in so much effort in this area and continues to accept our patches to
> improve it. I know hot insert is not nearly as important for network
> devices.
>
> When DPDK starts up, it needs to select whether to use virtual addresses or
> physical addresses in its memory maps. It can do that by answering the
> following
> questions:
>
> 1. Does the system only have buses that support an IOMMU?
> 2. Is the IOMMU sufficiently fast for the use case?
> 3. Will all of the devices that will be used with DPDK throughout the
> application's lifetime work with an IOMMU?
>
> If these three things are true, then the best choice is to use virtual
> addresses
> in the memory translations. However, if any of the above are not true it
> needs
> to fall back to physical addresses.
>
> #1 is checked by simply asking all of the buses, which are known up front.
> #2 is
> just assumed to be true. But #3 is not possible to check fully because of
> hot
> insert.
>
> The code currently approximates the #3 check by looking at the devices
> present
> at initialization time. If a device exists that's bound to vfio-pci, and no
> other devices exist that are bound to a uio driver, and DPDK has a
> registered
> driver that's actually going to load against the vfio-pci devices, then it
> will
> elect to use virtual addresses. This is purely a heuristic - it's not a
> definitive answer because the user could later hot insert a device that
> gets
> bound to uio.
>
> The user, of course, knows the answer to which addressing scheme to use
> typically. For example, these checks assume #2 is true, but there may be
> hardware implementations where it is not and the user wants to force
> physical
> addresses. Or the user may know that they are going to hot insert a device
> at
> run time that doesn't work with the IOMMU. That's why it's important to
> maintain
> the ability for the user to override the default heuristic's decision via
> the
> command line.
>
> My patch series is simply improving the heuristic in a few ways. First,
> previously each bus when queried would return either virtual or physical
> addresses as its choice. However, often the bus just does not have enough
> information to formulate any preference at all (and PCI was defaulting to
> physical addresses in this case). Instead, I made it so that the bus can
> return
> that it doesn't care, which pushes the decision up to a higher level. That
> higher level then makes the decision by checking whether it can access
> /proc/self/pagemap. Second, I narrowed the uio check such that physical
> addresses will only be selected if a device bound to uio exists and there
> is a
> driver registered to use it. Previously if any device was bound to uio it
> would
> select physical addresses, even if DPDK never ended up loading against that
> device.
>
> I think these two things make the heuristic choose the right thing more
> often,
> but it still won't always get it right so the command line option needs to
> remain.
>
>
After some exchanges offlist, on irc and taking some time looking at the
code, here are my conclusions.
Copying bus drivers maintainers/connaisseurs.

We have cases where we prefer using VA even if PA are available (for fslmc
where translating from iova as PA to VA is more costly).

I worked on Ben patches and summarised it as two main issues with the
current code:
- physical addresses availability is not taken into account early enough in
EAL init, and we end up with memory subsystem complaining later which is
not that user friendly.
  A collateral is that the init could have fallen back to using VA in most
cases if there were no strong requirement on PA.
- pci bus driver looks at all devices on the system, with no consideration
on the pci white/blacklist and no consideration on the fact that dpdk has a
driver that supports the device

I prepared a new series that I will send shortly.
I am currently considering the backport potential for it.
Thoughts?

Else, reviews are welcome.

Thanks.

-- 
David Marchand

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
  2019-06-03 10:48 ` David Marchand
@ 2019-06-03 16:44   ` Walker, Benjamin
  2019-06-14  8:42     ` David Marchand
  0 siblings, 1 reply; 5+ messages in thread
From: Walker, Benjamin @ 2019-06-03 16:44 UTC (permalink / raw)
  To: david.marchand, jerinj; +Cc: Burakov, Anatoly, dev

On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote:
> Hello, 
> 
> On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker@intel.com> wrote:
> > In SPDK, not all drivers are registered with DPDK at start up time.
> > Previously, that meant DPDK always chose to set itself up in IOVA_PA
> > mode. Instead, when the correct iova choice is unclear based on the
> > devices and drivers known to DPDK at start up time, use other heuristics
> > (such as whether /proc/self/pagemap is accessible) to make a better
> > choice.
> > 
> > This enables SPDK to run as an unprivileged user again without requiring
> > users to explicitly set the iova mode on the command line.
> > 
> 
> Interesting, I got a bz on something similar the day you sent this patchset ;-
> )
> 
> 
> - When a dpdk process is started, either it has access to physical addresses
> or not, and this won't change for the rest of its life.
> Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check makes
> sense to me.
> It is the most encountered situation when running ovs as non root on recent
> kernels.
> 
> 
> - However, I fail to see the need for all of this detection code wrt drivers
> and devices.
> 
> On one side of the equation, when dpdk starts, it checks physical address
> availability.
> On the other side of the equation, we have the drivers that will be invoked
> when probing devices (either at dpdk init, or when hotplugging a device).
> 
> At this point, the probing call should check the driver requirement wrt to the
> kernel driver the device is attached to.
> If this requirement is not fulfilled, then the probing fails.
> 
> 
> - This leaves the --iova-va forcing option. 
> Why do we need it?
> If we don't have access to physical addresses, no choice but run in VA mode.
> If we have access to physical addresses, the only case would be that you want
> to downgrade from PA to VA.
> But well, your process can still access it, not sure what the benefit is.

All of the complexity here, at least as far as I understand it, stems from
supporting hot insert of devices. This is very important to SPDK because storage
devices get hot inserted all the time, so we very much appreciate that DPDK has
put in so much effort in this area and continues to accept our patches to
improve it. I know hot insert is not nearly as important for network devices.

When DPDK starts up, it needs to select whether to use virtual addresses or
physical addresses in its memory maps. It can do that by answering the following
questions:

1. Does the system only have buses that support an IOMMU?
2. Is the IOMMU sufficiently fast for the use case?
3. Will all of the devices that will be used with DPDK throughout the
application's lifetime work with an IOMMU?

If these three things are true, then the best choice is to use virtual addresses
in the memory translations. However, if any of the above are not true it needs
to fall back to physical addresses.

#1 is checked by simply asking all of the buses, which are known up front. #2 is
just assumed to be true. But #3 is not possible to check fully because of hot
insert.

The code currently approximates the #3 check by looking at the devices present
at initialization time. If a device exists that's bound to vfio-pci, and no
other devices exist that are bound to a uio driver, and DPDK has a registered
driver that's actually going to load against the vfio-pci devices, then it will
elect to use virtual addresses. This is purely a heuristic - it's not a
definitive answer because the user could later hot insert a device that gets
bound to uio.

The user, of course, knows the answer to which addressing scheme to use
typically. For example, these checks assume #2 is true, but there may be
hardware implementations where it is not and the user wants to force physical
addresses. Or the user may know that they are going to hot insert a device at
run time that doesn't work with the IOMMU. That's why it's important to maintain
the ability for the user to override the default heuristic's decision via the
command line.

My patch series is simply improving the heuristic in a few ways. First,
previously each bus when queried would return either virtual or physical
addresses as its choice. However, often the bus just does not have enough
information to formulate any preference at all (and PCI was defaulting to
physical addresses in this case). Instead, I made it so that the bus can return
that it doesn't care, which pushes the decision up to a higher level. That
higher level then makes the decision by checking whether it can access
/proc/self/pagemap. Second, I narrowed the uio check such that physical
addresses will only be selected if a device bound to uio exists and there is a
driver registered to use it. Previously if any device was bound to uio it would
select physical addresses, even if DPDK never ended up loading against that
device.

I think these two things make the heuristic choose the right thing more often,
but it still won't always get it right so the command line option needs to
remain.

Thanks,
Ben

> 
> 
> Jerin, I can see in the history you worked on this.
> What did I miss?
> Is there something wrong with dropping the detection code?
> 
> 
> 
> -- 
> David Marchand


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
  2019-05-30 17:48 Ben Walker
@ 2019-06-03 10:48 ` David Marchand
  2019-06-03 16:44   ` Walker, Benjamin
  0 siblings, 1 reply; 5+ messages in thread
From: David Marchand @ 2019-06-03 10:48 UTC (permalink / raw)
  To: Ben Walker, Jerin Jacob Kollanukkaran; +Cc: dev, Burakov, Anatoly

Hello,

On Thu, May 30, 2019 at 7:48 PM Ben Walker <benjamin.walker@intel.com>
wrote:

> In SPDK, not all drivers are registered with DPDK at start up time.
> Previously, that meant DPDK always chose to set itself up in IOVA_PA
> mode. Instead, when the correct iova choice is unclear based on the
> devices and drivers known to DPDK at start up time, use other heuristics
> (such as whether /proc/self/pagemap is accessible) to make a better
> choice.
>
> This enables SPDK to run as an unprivileged user again without requiring
> users to explicitly set the iova mode on the command line.
>
>
Interesting, I got a bz on something similar the day you sent this patchset
;-)


- When a dpdk process is started, either it has access to physical
addresses or not, and this won't change for the rest of its life.
Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check
makes sense to me.
It is the most encountered situation when running ovs as non root on recent
kernels.


- However, I fail to see the need for all of this detection code wrt
drivers and devices.

On one side of the equation, when dpdk starts, it checks physical address
availability.
On the other side of the equation, we have the drivers that will be invoked
when probing devices (either at dpdk init, or when hotplugging a device).

At this point, the probing call should check the driver requirement wrt to
the kernel driver the device is attached to.
If this requirement is not fulfilled, then the probing fails.


- This leaves the --iova-va forcing option.
Why do we need it?
If we don't have access to physical addresses, no choice but run in VA mode.
If we have access to physical addresses, the only case would be that you
want to downgrade from PA to VA.
But well, your process can still access it, not sure what the benefit is.


Jerin, I can see in the history you worked on this.
What did I miss?
Is there something wrong with dropping the detection code?



-- 
David Marchand

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode
@ 2019-05-30 17:48 Ben Walker
  2019-06-03 10:48 ` David Marchand
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Walker @ 2019-05-30 17:48 UTC (permalink / raw)
  To: dev

In SPDK, not all drivers are registered with DPDK at start up time.
Previously, that meant DPDK always chose to set itself up in IOVA_PA
mode. Instead, when the correct iova choice is unclear based on the
devices and drivers known to DPDK at start up time, use other heuristics
(such as whether /proc/self/pagemap is accessible) to make a better
choice.

This enables SPDK to run as an unprivileged user again without requiring
users to explicitly set the iova mode on the command line.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-14  8:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04 11:28 [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode Jerin Jacob Kollanukkaran
  -- strict thread matches above, loose matches on Subject: below --
2019-05-30 17:48 Ben Walker
2019-06-03 10:48 ` David Marchand
2019-06-03 16:44   ` Walker, Benjamin
2019-06-14  8:42     ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).