DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: dev <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
	 Ben Walker <benjamin.walker@intel.com>
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection
Date: Wed, 10 Jul 2019 10:09:50 +0200	[thread overview]
Message-ID: <CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com> (raw)
In-Reply-To: <BYAPR18MB242483AB7A3D9E9AD6B1FAEEC8F10@BYAPR18MB2424.namprd18.prod.outlook.com>

Hello guys,

On Tue, Jul 9, 2019 at 7:52 PM Jerin Jacob Kollanukkaran <jerinj@marvell.com>
wrote:

> > -----Original Message-----
> > From: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Sent: Tuesday, July 9, 2019 8:07 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> > <david.marchand@redhat.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> > Walker <benjamin.walker@intel.com>
> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> issue.
> > >>
> > >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > >> work without it, whereas in this case it's more of a "highly
> > >> recommended" rather than "need".
> > >
> > > It is "need" as performance is horrible without it as is per packet SW
> > translation.
> > > A "need" for DPDK performance perspective.
> >
> > Would the driver fail to initialize if it detects running as IOVA as PA?
>
> Yes.
> https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191
>
> > Also, some other use cases will also require IOVA as PA while having full
> > IOMMU support. An example of this would be systems with limited IOMMU
> > width (such as VM's) - even though the IOMMU is technically supported, we
> > may not have the necessary address width to run all devices in IOVA as VA
> > mode, and would need to fall back to IOVA as PA.
> > Since we cannot *require* IOVA as VA in current codebase, any driver that
> > expects IOVA as VA to always be enabled will presumably not work.
> >
> > >
> > > Again, it is not device attribute, it is system attribute.
> >
> > If it's a system attribute, why is it a device driver flag then? The
> system may
> > or may not support IOMMU, the device itself probably doesn't care since
> bus
> > address looks the same in both cases, *but the driver
> > might* (such as would be in your case - requiring IOVA as VA and
> disallowing
> > IOVA as PA for performance reasons).
>
> Agree.
>
> >
> > Currently (again, disregarding your interpretation of how IOVA as VA
> works
> > and looking at the actual commit history), we always seem to imply that
> IOVA
> > as PA works for all devices, and we use IOVA_AS_VA flag to indicate that
> the
> > device *also* supports IOVA as VA mode.
> >
> > But we don't have any way to express a *requirement* for IOVA as VA mode
> > - only for IOVA as PA mode. That is the purpose of the new flag. You are
> > stating that the IOVA_AS_VA drv flag is an expression of that
> requirement,
> > but that is not reflected in the codebase - our commit history indicates
> that
> > we don't treat IOVA as VA as hard requirement whenever this flag is
> > specified (and i would argue that we shouldn't).
>
> No objection to further classify it.
>

I propose to introduce:
* RTE_PCI_DRV_IOVA_AS_PA which means "the combination of the pmd+kmod+hw
supports usage of Physical Addresses"
* RTE_PCI_DRV_IOVA_AS_VA which means "the combination of the pmd+kmod+hw
supports usage of Virtual Addresses"

- For the pci bus, the algorigthm would be:

devices_want_pa = false
devices_want_va = false

Foreach pci device
  Skip blacklisted devices
  Skip unbound devices (i.e. we only consider devices bound to a known
kernel driver)
  Skip unsupported devices (i.e. we only consider devices that have a pmd
that supports them)

  If the combination pmd+kmod only supports VA (RTE_PCI_DRV_IOVA_AS_VA
capability in driver flags), then devices_want_va = true
  Else if the combination pmd+kmod only supports PA (RTE_PCI_DRV_IOVA_AS_PA
capability in driver flags), then devices_want_pa = true

If devices_want_va and !devices_want_pa
  return RTE_IOVA_VA
If devices_want_pa and !devices_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC

Notes:
* the IOMMU limitations are considered as a per device/driver thing, since
the kmod is the one that configures the system IOMMU,
* the case "devices_want_pa and devices_want_va" is considered as DC, we
leave EAL decide based on the physical addresses availability because we
can't comply with all present devices/drivers in the system.
  This means that at bus probe time for a device, we must add a check that
the combination is fulfilled (and avoid this check in the drivers
themselves).


- For the global bus code, that aggregates the different buses preferences,
we need to do the same, while I suspect a bug at the moment.

The algorigthm:

buses_want_pa = false
buses_want_va = false

Foreach bus
  If the bus reports RTE_IOVA_VA, then buses_want_va = true
  Else if the bus reports RTE_IOVA_PA, then buses_want_pa = true

If buses_want_va and !buses_want_pa
  return RTE_IOVA_VA
If buses_want_pa and !buses_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC


- Finally at EAL level, we keep the current code.


Hope I did not miss anything.
If we agree on this, I will send the changes and an update in the
documentation.


-- 
David Marchand

  reply	other threads:[~2019-07-10  8:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08 14:24 [dpdk-dev] " jerinj
2019-07-08 18:39 ` David Marchand
2019-07-08 19:13   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-09  8:39     ` Bruce Richardson
2019-07-09  9:05       ` Jerin Jacob Kollanukkaran
2019-07-09  9:32         ` Bruce Richardson
2019-07-09  9:44     ` Burakov, Anatoly
2019-07-09 11:13       ` Jerin Jacob Kollanukkaran
2019-07-09 11:40         ` Burakov, Anatoly
2019-07-09 12:11           ` Jerin Jacob Kollanukkaran
2019-07-09 13:30             ` Burakov, Anatoly
2019-07-09 13:50               ` Burakov, Anatoly
2019-07-09 14:19                 ` Jerin Jacob Kollanukkaran
2019-07-09 14:00               ` Jerin Jacob Kollanukkaran
2019-07-09 14:37                 ` Burakov, Anatoly
2019-07-09 15:04                   ` Thomas Monjalon
2019-07-09 15:06                     ` Burakov, Anatoly
2019-07-09 17:50                   ` Jerin Jacob Kollanukkaran
2019-07-10  8:09                     ` David Marchand [this message]
2019-07-09 14:54                 ` Burakov, Anatoly
2019-07-09 14:58                   ` Jerin Jacob Kollanukkaran
2019-07-09 15:02                     ` Burakov, Anatoly
2019-07-09 15:12                       ` Thomas Monjalon
2019-07-09 15:18                         ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com' \
    --to=david.marchand@redhat.com \
    --cc=anatoly.burakov@intel.com \
    --cc=benjamin.walker@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).