From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
David Marchand <david.marchand@redhat.com>
Cc: dev <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
Ben Walker <benjamin.walker@intel.com>
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection
Date: Tue, 9 Jul 2019 15:37:05 +0100 [thread overview]
Message-ID: <553b3a91-7458-98d0-9df6-5b53010d326f@intel.com> (raw)
In-Reply-To: <BYAPR18MB24245E21BD5A109E795C0FE5C8F10@BYAPR18MB2424.namprd18.prod.outlook.com>
On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Tuesday, July 9, 2019 7:00 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>> <david.marchand@redhat.com>
>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
>> Walker <benjamin.walker@intel.com>
>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
>> selection
>>
>> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
>>>> -----Original Message-----
>>>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>>>> Sent: Tuesday, July 9, 2019 5:10 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
>>>> <david.marchand@redhat.com>
>>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>;
>> Ben
>>>> Walker <benjamin.walker@intel.com>
>>>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
>>>> mode selection
>>>>>>> ________________________________________
>>>>>>>
>>>>>>> On Mon, Jul 8, 2019 at 4:25 PM <mailto:jerinj@marvell.com> wrote:
>>>>>>> From: Jerin Jacob <mailto:jerinj@marvell.com>
>>>>>>>
>>>>>>> Existing logic fails to select IOVA mode as VA if driver request
>>>>>>> to enable IOVA as VA.
>>>>>>>
>>>>>>> IOVA as VA has more strict requirement than other modes, so
>>>>>>> enabling positive logic for IOVA as VA selection.
>>>>>>>
>>>>>>> This patch also updates the default IOVA mode as PA for PCI
>>>>>>> devices as it has to deal with DMA engines unlike the virtual
>>>>>>> devices that may need only IOVA as DC.
>>>>>>>
>>>>>>> We have three cases:
>>>>>>> - driver/hw supports IOVA as PA only
>>>>>>>
>>>>>>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
>>>>>>> non IOMMU). We are already addressing that case
>>>>>>
>>>>>> I don't get how this works. How does "system capability" affect
>>>>>> what the device itself supports? Are we to assume that *all*
>>>>>> hardware support IOVA as VA by default? "System capability" is more
>>>>>> of a bus issue than an individual device issue, is it not?
>>>>>
>>>>> What I meant is, supporting VA vs PA is function of IOMMU(not the
>>>>> device
>>>> attribute).
>>>>> Ie. Device makes the bus master request, if IOMMU available and
>>>>> enabled in the SYSTEM , It goes over IOMMU and translate the IOVA
>>>>> to
>>>> physical address.
>>>>>
>>>>> Another way to put is, Is there any _PCIe_ device which
>>>>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
>>>>> rte_pci_driver.drv_flags
>>>>>
>>>>>
>>>>
>>>> Previously, as far as i can tell, the flag was used to indicate
>>>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
>>>> For example, there are multiple patches [1][2][3][4] (i'm sure i can
>>>> find more!) that added IOVA as VA support to various drivers, and
>>>> they all were worded it in this exact way
>>>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
>>>> far as i can tell, none of these drivers *require* IOVA as VA mode -
>>>> they merely use this flag to indicate support for it.
>>>
>>> Some class of devices NEED IOVA as VA for performance reasons.
>>> Specially the devices has HW mempool allocators. On those devices If
>>> we don’t use IOVA as VA, Upon getting packet from device, It needs to
>>> go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
>> real performance issue.
>>
>> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
>> without it, whereas in this case it's more of a "highly recommended" rather
>> than "need".
>
> It is "need" as performance is horrible without it as is per packet SW translation.
> A "need" for DPDK performance perspective.
Would the driver fail to initialize if it detects running as IOVA as PA?
>
>>
>>>>
>>>> Now suddenly it turns out that someone somewhere "knew" that "IOVA
>> as
>>>> VA" flag in PCI drivers is supposed to indicate *requirement* and not
>>>> support, and it appears that this knowledge was not communicated nor
>>>> documented anywhere, and is now treated as common knowledge.
>>>
>>> I think, the confusion here is, I was under impression that # If
>>> device supports IOVA as VA and system runs with IOMMU then the dpdk
>>> should run in IOVA as VA mode.
>>> If above statement true then we don’t really need a new flag.
>>
>> Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
>> not that it *requires* it.
>>
>>>
>>> Couple of points to make forward progress:
>>> # If we think, there is a use case where device is IOVA as VA And
>>> system runs in IOMMU mode then for some reason DPDK needs to run in
>> PA
>>> mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
>>> can run either modes
>>
>> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
>> VA or PA is a different from whether IOMMU is in use - there is no law that
>> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
>> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
>> legal to program IOMMU to use IOVA as PA (which we currently do when we
>> e.g. use VFIO for some devices and igb_uio for others).
>
> For KNI, we already submitted a patch to support IOVA as VA.
Yep, point being that it *didn't work* before, hence we may want to
account for possible future use cases like this (however admittedly
hacky they are). There are valid use cases to enforce IOVA as VA support
only (such as for performance reasons, even though it would be
technically possible to use IOVA as PA), and there could be valid use
cases to enforce IOVA as PA support only (for example, i seem to
remember that crypto/qat driver at one point didn't support VFIO driver,
effectively rendering it not supporting IOVA as VA).
> I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
> Is igb_uio enables IOMMU? I don’t see any reference.
> grep -ri "iommu" kernel/linux/igb_uio/
igb_uio can work with IOMMU with pass-through mode. When Linux is booted
up in pass-through, IOMMU is enabled and igb_uio will work, and VFIO can
use both IOVA as PA and IOVA as VA, while igb_uio can only use IOVA as
PA. So yes, igb_uio does enable IOMMU in a very limited way, but only to
set up 1:1 mapping of IOVA with PA.
Also, some other use cases will also require IOVA as PA while having
full IOMMU support. An example of this would be systems with limited
IOMMU width (such as VM's) - even though the IOMMU is technically
supported, we may not have the necessary address width to run all
devices in IOVA as VA mode, and would need to fall back to IOVA as PA.
Since we cannot *require* IOVA as VA in current codebase, any driver
that expects IOVA as VA to always be enabled will presumably not work.
>
> Again, it is not device attribute, it is system attribute.
If it's a system attribute, why is it a device driver flag then? The
system may or may not support IOMMU, the device itself probably doesn't
care since bus address looks the same in both cases, *but the driver
might* (such as would be in your case - requiring IOVA as VA and
disallowing IOVA as PA for performance reasons).
> In current KNI case it
> Fall backs to PA irrespective of device capability so we don’t need any
> separate flag from driver.
>
> Even if we introduce a flag, what it is supposed to do?
The same thing you are suggesting for one of your HW mempool drivers: a
"need" to only run in IOVA as VA mode.
The logic in this case (as far as the driver is concerned, disregarding
the kernel driver issue for now) would be:
has_pa = (drv->flags & SUPPORTS_PA) != 0;
has_va = (drv->flags & SUPPORTS_VA) != 0;
if (has_va && has_pa)
return RTE_IOVA_DC; // don't care, supports both
if (has_va)
return RTE_IOVA_VA; // only supports VA - your case
return RTE_IOVA_PA; // only supports PA
Currently (again, disregarding your interpretation of how IOVA as VA
works and looking at the actual commit history), we always seem to imply
that IOVA as PA works for all devices, and we use IOVA_AS_VA flag to
indicate that the device *also* supports IOVA as VA mode.
But we don't have any way to express a *requirement* for IOVA as VA mode
- only for IOVA as PA mode. That is the purpose of the new flag. You are
stating that the IOVA_AS_VA drv flag is an expression of that
requirement, but that is not reflected in the codebase - our commit
history indicates that we don't treat IOVA as VA as hard requirement
whenever this flag is specified (and i would argue that we shouldn't).
>
>>
>>> RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
>>
>> If we're adding a flag, we might as well not create a confusion and do it
>> consistently. If IOVA as PA is supported, have a flag to indicate that. If IOVA
>> as VA is supported, have a flag to indicate that. Absence of either flag implies
>
> So in what category i40e driver comes? By default, pci bus should return PA for class.
> If VA supported then return VA.
> So how new flag will help?
We seem to be in agreement that we need *two* flags to express all three
of the above. The question is, which flags. You suggest to have
"supports IOVA as VA" and "requires IOVA as VA" as two options, while i
am suggesting to have "supports IOVA as PA" and "supports IOVA as VA" as
flags. This requires modification to all existing drivers and is perhaps
undesirable from that point of view (this isn't my decision), but it's
less confusing than having two IOVA-as-VA flags that differ slightly in
their meaning (supports VA vs. requires VA).
Going back to your i40e example, AFAIK i40e supports both IOVA as VA and
IOVA as PA - so in this case it should return RTE_IOVA_DC (i.e. use
whatever's available). If other devices also don't care, then push the
decision to the upper layers and not decide anything at the bus level [1].
[1] http://patchwork.dpdk.org/patch/54801/
>
>> inability to work in that mode. I don't see how this is less clear and self-
>> documenting than having two IOVA as VA-related flags that have slightly
>> different meaning and imply things not otherwise stated explicitly.
>>
>>> # With top of tree, Currently it never runs in IOVA as VA mode.
>>> That’s a separate problem to fix. Which effect all the devices
>>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
>>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
>>> protection and/or root privilege is required to run DPDK.
>
> What's your view on this existing problem?
My view would be to always run in IOVA as VA by default and only falling
back to IOVA as PA if there is a need to do that. Yet, it seems that
whenever i try to bring this up, the response (not necessarily from you,
so this is not directed at you specifically) seems to be that because of
hotplug, we have to start in the "safest" (from device support point of
view) mode - that is, in IOVA as PA. Seeing how, as you claim, some
devices require IOVA as VA, then IOVA as PA is no longer the "safe"
default that all devices will support. Perhaps we can use this
opportunity to finally make IOVA as VA the default :)
>
>
>>>
>>>
>>>>
>>>> [1] http://patchwork.dpdk.org/patch/53206/
>>>> [2] http://patchwork.dpdk.org/patch/50274/
>>>> [3] http://patchwork.dpdk.org/patch/50991/
>>>> [4] http://patchwork.dpdk.org/patch/46134/
>>>>
>>>> --
>>>> Thanks,
>>>> Anatoly
>>
>>
>> --
>> Thanks,
>> Anatoly
--
Thanks,
Anatoly
next prev parent reply other threads:[~2019-07-09 14:37 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-08 14:24 [dpdk-dev] " jerinj
2019-07-08 18:39 ` David Marchand
2019-07-08 19:13 ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-09 8:39 ` Bruce Richardson
2019-07-09 9:05 ` Jerin Jacob Kollanukkaran
2019-07-09 9:32 ` Bruce Richardson
2019-07-09 9:44 ` Burakov, Anatoly
2019-07-09 11:13 ` Jerin Jacob Kollanukkaran
2019-07-09 11:40 ` Burakov, Anatoly
2019-07-09 12:11 ` Jerin Jacob Kollanukkaran
2019-07-09 13:30 ` Burakov, Anatoly
2019-07-09 13:50 ` Burakov, Anatoly
2019-07-09 14:19 ` Jerin Jacob Kollanukkaran
2019-07-09 14:00 ` Jerin Jacob Kollanukkaran
2019-07-09 14:37 ` Burakov, Anatoly [this message]
2019-07-09 15:04 ` Thomas Monjalon
2019-07-09 15:06 ` Burakov, Anatoly
2019-07-09 17:50 ` Jerin Jacob Kollanukkaran
2019-07-10 8:09 ` David Marchand
2019-07-09 14:54 ` Burakov, Anatoly
2019-07-09 14:58 ` Jerin Jacob Kollanukkaran
2019-07-09 15:02 ` Burakov, Anatoly
2019-07-09 15:12 ` Thomas Monjalon
2019-07-09 15:18 ` Burakov, Anatoly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=553b3a91-7458-98d0-9df6-5b53010d326f@intel.com \
--to=anatoly.burakov@intel.com \
--cc=benjamin.walker@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).