From: David Marchand <david.marchand@redhat.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: dev <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
Ben Walker <benjamin.walker@intel.com>
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection
Date: Wed, 10 Jul 2019 10:09:50 +0200 [thread overview]
Message-ID: <CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com> (raw)
In-Reply-To: <BYAPR18MB242483AB7A3D9E9AD6B1FAEEC8F10@BYAPR18MB2424.namprd18.prod.outlook.com>
Hello guys,
On Tue, Jul 9, 2019 at 7:52 PM Jerin Jacob Kollanukkaran <jerinj@marvell.com>
wrote:
> > -----Original Message-----
> > From: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Sent: Tuesday, July 9, 2019 8:07 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> > <david.marchand@redhat.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> > Walker <benjamin.walker@intel.com>
> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> issue.
> > >>
> > >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > >> work without it, whereas in this case it's more of a "highly
> > >> recommended" rather than "need".
> > >
> > > It is "need" as performance is horrible without it as is per packet SW
> > translation.
> > > A "need" for DPDK performance perspective.
> >
> > Would the driver fail to initialize if it detects running as IOVA as PA?
>
> Yes.
> https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191
>
> > Also, some other use cases will also require IOVA as PA while having full
> > IOMMU support. An example of this would be systems with limited IOMMU
> > width (such as VM's) - even though the IOMMU is technically supported, we
> > may not have the necessary address width to run all devices in IOVA as VA
> > mode, and would need to fall back to IOVA as PA.
> > Since we cannot *require* IOVA as VA in current codebase, any driver that
> > expects IOVA as VA to always be enabled will presumably not work.
> >
> > >
> > > Again, it is not device attribute, it is system attribute.
> >
> > If it's a system attribute, why is it a device driver flag then? The
> system may
> > or may not support IOMMU, the device itself probably doesn't care since
> bus
> > address looks the same in both cases, *but the driver
> > might* (such as would be in your case - requiring IOVA as VA and
> disallowing
> > IOVA as PA for performance reasons).
>
> Agree.
>
> >
> > Currently (again, disregarding your interpretation of how IOVA as VA
> works
> > and looking at the actual commit history), we always seem to imply that
> IOVA
> > as PA works for all devices, and we use IOVA_AS_VA flag to indicate that
> the
> > device *also* supports IOVA as VA mode.
> >
> > But we don't have any way to express a *requirement* for IOVA as VA mode
> > - only for IOVA as PA mode. That is the purpose of the new flag. You are
> > stating that the IOVA_AS_VA drv flag is an expression of that
> requirement,
> > but that is not reflected in the codebase - our commit history indicates
> that
> > we don't treat IOVA as VA as hard requirement whenever this flag is
> > specified (and i would argue that we shouldn't).
>
> No objection to further classify it.
>
I propose to introduce:
* RTE_PCI_DRV_IOVA_AS_PA which means "the combination of the pmd+kmod+hw
supports usage of Physical Addresses"
* RTE_PCI_DRV_IOVA_AS_VA which means "the combination of the pmd+kmod+hw
supports usage of Virtual Addresses"
- For the pci bus, the algorigthm would be:
devices_want_pa = false
devices_want_va = false
Foreach pci device
Skip blacklisted devices
Skip unbound devices (i.e. we only consider devices bound to a known
kernel driver)
Skip unsupported devices (i.e. we only consider devices that have a pmd
that supports them)
If the combination pmd+kmod only supports VA (RTE_PCI_DRV_IOVA_AS_VA
capability in driver flags), then devices_want_va = true
Else if the combination pmd+kmod only supports PA (RTE_PCI_DRV_IOVA_AS_PA
capability in driver flags), then devices_want_pa = true
If devices_want_va and !devices_want_pa
return RTE_IOVA_VA
If devices_want_pa and !devices_want_va
return RTE_IOVA_PA
return RTE_IOVA_DC
Notes:
* the IOMMU limitations are considered as a per device/driver thing, since
the kmod is the one that configures the system IOMMU,
* the case "devices_want_pa and devices_want_va" is considered as DC, we
leave EAL decide based on the physical addresses availability because we
can't comply with all present devices/drivers in the system.
This means that at bus probe time for a device, we must add a check that
the combination is fulfilled (and avoid this check in the drivers
themselves).
- For the global bus code, that aggregates the different buses preferences,
we need to do the same, while I suspect a bug at the moment.
The algorigthm:
buses_want_pa = false
buses_want_va = false
Foreach bus
If the bus reports RTE_IOVA_VA, then buses_want_va = true
Else if the bus reports RTE_IOVA_PA, then buses_want_pa = true
If buses_want_va and !buses_want_pa
return RTE_IOVA_VA
If buses_want_pa and !buses_want_va
return RTE_IOVA_PA
return RTE_IOVA_DC
- Finally at EAL level, we keep the current code.
Hope I did not miss anything.
If we agree on this, I will send the changes and an update in the
documentation.
--
David Marchand
next prev parent reply other threads:[~2019-07-10 8:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 14:24 [dpdk-dev] " jerinj
2019-07-08 18:39 ` David Marchand
2019-07-08 19:13 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-09 8:39 ` Bruce Richardson
2019-07-09 9:05 ` Jerin Jacob Kollanukkaran
2019-07-09 9:32 ` Bruce Richardson
2019-07-09 9:44 ` Burakov, Anatoly
2019-07-09 11:13 ` Jerin Jacob Kollanukkaran
2019-07-09 11:40 ` Burakov, Anatoly
2019-07-09 12:11 ` Jerin Jacob Kollanukkaran
2019-07-09 13:30 ` Burakov, Anatoly
2019-07-09 13:50 ` Burakov, Anatoly
2019-07-09 14:19 ` Jerin Jacob Kollanukkaran
2019-07-09 14:00 ` Jerin Jacob Kollanukkaran
2019-07-09 14:37 ` Burakov, Anatoly
2019-07-09 15:04 ` Thomas Monjalon
2019-07-09 15:06 ` Burakov, Anatoly
2019-07-09 17:50 ` Jerin Jacob Kollanukkaran
2019-07-10 8:09 ` David Marchand [this message]
2019-07-09 14:54 ` Burakov, Anatoly
2019-07-09 14:58 ` Jerin Jacob Kollanukkaran
2019-07-09 15:02 ` Burakov, Anatoly
2019-07-09 15:12 ` Thomas Monjalon
2019-07-09 15:18 ` Burakov, Anatoly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com' \
--to=david.marchand@redhat.com \
--cc=anatoly.burakov@intel.com \
--cc=benjamin.walker@intel.com \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).