* [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
@ 2018-01-08 13:51 Maxime Coquelin
2018-01-08 15:34 ` Stephen Hemminger
2018-01-08 15:38 ` Stephen Hemminger
0 siblings, 2 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-01-08 13:51 UTC (permalink / raw)
To: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov, thomas
Cc: peterx, Maxime Coquelin
Intel VT-d supports different address widths for the IOVAs, from
39 bits to 56 bits.
While recent processors support at least 48 bits, VT-d emulation
currently only supports 39 bits. It makes DMA mapping to fail in this
case when using VA as IOVA mode, as user-space virtual addresses uses
up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
This patch parses VT-d CAP register value available in sysfs, and
forbid VA as IOVA mode if the GAW is 39 bits or unknown.
Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Hi,
I'm not super happy with the patch as it does platform specific things in
generic code, but there are no placeholder for IOMMU/VT-d at the moment.
As this patch is to be backported to v17.11 LTS, it cannot be a big rework.
If you have some suggestion to improve it, please let me know.
The fix is quite urgent, as guest device assignment with vIOMMU is broken in
mainline & v17.11 LTS.
Advantage of this fix over forbidding VA as IOVA when running in emulation is
that VT-d emulation will soon support 48 bits, so this is future proof. Also,
VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
if I don't know any.
Thanks,
Maxime
drivers/bus/pci/linux/pci.c | 98 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 89 insertions(+), 9 deletions(-)
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 5da6728fb..292633ee2 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -576,6 +576,90 @@ pci_one_device_has_iova_va(void)
return 0;
}
+static inline bool
+pci_one_device_iommu_support_va(struct rte_pci_device *dev)
+{
+#if defined(RTE_ARCH_PPC_64)
+ return false;
+#elif defined(RTE_ARCH_X86)
+
+#define VTD_CAP_SAGAW_SHIFT 8
+#define VTD_CAP_SAGAW_MASK (0x1fULL << VTD_CAP_SAGAW_SHIFT)
+#define X86_VA_WIDTH 47 /* From Documentation/x86/x86_64/mm.txt */
+ struct rte_pci_addr *addr = &dev->addr;
+ char filename[PATH_MAX];
+ FILE *fp;
+ uint64_t sagaw, vtd_cap_reg = 0;
+ int guest_addr_width = 0;
+
+ snprintf(filename, sizeof(filename),
+ "%s/" PCI_PRI_FMT "/iommu/intel-iommu/cap",
+ rte_pci_get_sysfs_path(), addr->domain, addr->bus, addr->devid,
+ addr->function);
+ if (access(filename, F_OK) == -1) {
+ /* We don't have an Intel IOMMU, assume VA supported*/
+ return true;
+ }
+
+ /* We have an intel IOMMU */
+ fp = fopen(filename, "r");
+ if (fp == NULL) {
+ RTE_LOG(ERR, EAL, "%s(): can't open %s\n", __func__, filename);
+ return false;
+ }
+
+ if (fscanf(fp, "%lx", &vtd_cap_reg) != 1) {
+ RTE_LOG(ERR, EAL, "%s(): can't read %s\n", __func__, filename);
+ fclose(fp);
+ return false;
+ }
+
+ fclose(fp);
+
+ sagaw = (vtd_cap_reg & VTD_CAP_SAGAW_MASK) >> VTD_CAP_SAGAW_SHIFT;
+
+ switch (sagaw) {
+ case 2:
+ guest_addr_width = 39;
+ break;
+ case 4:
+ guest_addr_width = 48;
+ break;
+ case 6:
+ guest_addr_width = 56;
+ break;
+ default:
+ RTE_LOG(ERR, EAL, "Unkwown Intel IOMMU SAGAW value (%lx)\n",
+ sagaw);
+ break;
+ }
+
+ if (guest_addr_width < X86_VA_WIDTH)
+ return false;
+#endif
+ return true;
+}
+
+/*
+ * All devices IOMMUs support VA as IOVA
+ */
+static inline bool
+pci_devices_iommu_support_va(void)
+{
+ struct rte_pci_device *dev = NULL;
+ struct rte_pci_driver *drv = NULL;
+
+ FOREACH_DRIVER_ON_PCIBUS(drv) {
+ FOREACH_DEVICE_ON_PCIBUS(dev) {
+ if (!rte_pci_match(drv, dev))
+ continue;
+ if (!pci_one_device_iommu_support_va(dev))
+ return false;
+ }
+ }
+ return true;
+}
+
/*
* Get iommu class of PCI devices on the bus.
*/
@@ -586,12 +670,7 @@ rte_pci_get_iommu_class(void)
bool is_vfio_noiommu_enabled = true;
bool has_iova_va;
bool is_bound_uio;
- bool spapr_iommu =
-#if defined(RTE_ARCH_PPC_64)
- true;
-#else
- false;
-#endif
+ bool iommu_no_va;
is_bound = pci_one_device_is_bound();
if (!is_bound)
@@ -599,13 +678,14 @@ rte_pci_get_iommu_class(void)
has_iova_va = pci_one_device_has_iova_va();
is_bound_uio = pci_one_device_bound_uio();
+ iommu_no_va = !pci_devices_iommu_support_va();
#ifdef VFIO_PRESENT
is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ?
true : false;
#endif
if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled &&
- !spapr_iommu)
+ !iommu_no_va)
return RTE_IOVA_VA;
if (has_iova_va) {
@@ -614,8 +694,8 @@ rte_pci_get_iommu_class(void)
RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n");
if (is_bound_uio)
RTE_LOG(WARNING, EAL, "few device bound to UIO\n");
- if (spapr_iommu)
- RTE_LOG(WARNING, EAL, "sPAPR IOMMU does not support IOVA as VA\n");
+ if (iommu_no_va)
+ RTE_LOG(WARNING, EAL, "IOMMU does not support IOVA as VA\n");
}
return RTE_IOVA_PA;
--
2.14.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
2018-01-08 13:51 [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small Maxime Coquelin
@ 2018-01-08 15:34 ` Stephen Hemminger
2018-01-08 15:48 ` Maxime Coquelin
2018-01-08 15:38 ` Stephen Hemminger
1 sibling, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2018-01-08 15:34 UTC (permalink / raw)
To: Maxime Coquelin
Cc: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov,
thomas, peterx
On Mon, 8 Jan 2018 14:51:27 +0100
Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> Intel VT-d supports different address widths for the IOVAs, from
> 39 bits to 56 bits.
>
> While recent processors support at least 48 bits, VT-d emulation
> currently only supports 39 bits. It makes DMA mapping to fail in this
> case when using VA as IOVA mode, as user-space virtual addresses uses
> up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
>
> This patch parses VT-d CAP register value available in sysfs, and
> forbid VA as IOVA mode if the GAW is 39 bits or unknown.
>
> Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
>
> Cc: stable@dpdk.org
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Hi,
>
> I'm not super happy with the patch as it does platform specific things in
> generic code, but there are no placeholder for IOMMU/VT-d at the moment.
>
> As this patch is to be backported to v17.11 LTS, it cannot be a big rework.
>
> If you have some suggestion to improve it, please let me know.
>
> The fix is quite urgent, as guest device assignment with vIOMMU is broken in
> mainline & v17.11 LTS.
>
> Advantage of this fix over forbidding VA as IOVA when running in emulation is
> that VT-d emulation will soon support 48 bits, so this is future proof. Also,
> VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
> if I don't know any.
>
> Thanks,
> Maxime
You are assumming that if IOMMU is present that it is being used (ie VFIO).
What about the case of direct access to PF device via IGB_UIO?
> +static inline bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{
This is not in fast path, there is no reason it should be inline
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
2018-01-08 13:51 [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small Maxime Coquelin
2018-01-08 15:34 ` Stephen Hemminger
@ 2018-01-08 15:38 ` Stephen Hemminger
2018-01-08 15:54 ` Maxime Coquelin
1 sibling, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2018-01-08 15:38 UTC (permalink / raw)
To: Maxime Coquelin
Cc: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov,
thomas, peterx
On Mon, 8 Jan 2018 14:51:27 +0100
Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> +static inline bool
> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> +{
> +#if defined(RTE_ARCH_PPC_64)
> + return false;
> +#elif defined(RTE_ARCH_X86)
> +
The cleaner way to handle this kind of ifdef is:
#ifdef RTE_ARCH_X86
static bool
pci_one_device_iommu_support_va(struct rte_pci_device *dev)
{
....
}
#elif defined(RTE_ARCH_PPC_64)
static inline bool
pci_one_device_iommu_support_va(struct rte_pci_device *dev)
{
return false;
}
#endif
What about AMD64?
Do all ARM processors have IOMMU, I think not.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
2018-01-08 15:34 ` Stephen Hemminger
@ 2018-01-08 15:48 ` Maxime Coquelin
0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-01-08 15:48 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov,
thomas, peterx
On 01/08/2018 04:34 PM, Stephen Hemminger wrote:
> On Mon, 8 Jan 2018 14:51:27 +0100
> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
>
>> Intel VT-d supports different address widths for the IOVAs, from
>> 39 bits to 56 bits.
>>
>> While recent processors support at least 48 bits, VT-d emulation
>> currently only supports 39 bits. It makes DMA mapping to fail in this
>> case when using VA as IOVA mode, as user-space virtual addresses uses
>> up to 47 bits (see kernel's Documentation/x86/x86_64/mm.txt).
>>
>> This patch parses VT-d CAP register value available in sysfs, and
>> forbid VA as IOVA mode if the GAW is 39 bits or unknown.
>>
>> Fixes: f37dfab21c98 ("drivers/net: enable IOVA mode for Intel PMDs")
>>
>> Cc: stable@dpdk.org
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> Hi,
>>
>> I'm not super happy with the patch as it does platform specific things in
>> generic code, but there are no placeholder for IOMMU/VT-d at the moment.
>>
>> As this patch is to be backported to v17.11 LTS, it cannot be a big rework.
>>
>> If you have some suggestion to improve it, please let me know.
>>
>> The fix is quite urgent, as guest device assignment with vIOMMU is broken in
>> mainline & v17.11 LTS.
>>
>> Advantage of this fix over forbidding VA as IOVA when running in emulation is
>> that VT-d emulation will soon support 48 bits, so this is future proof. Also,
>> VT-d spec supports 39 bits, so we could have physical CPUs supporting it, even
>> if I don't know any.
>>
>> Thanks,
>> Maxime
>
> You are assumming that if IOMMU is present that it is being used (ie VFIO).
> What about the case of direct access to PF device via IGB_UIO?
As soon as one device is bound to UIO or VFIO in noiomu mode, PA as IOVA
mode will be selected.
This is done in rte_pci_get_iommu_class(), by calling
pci_one_device_bound_uio() and rte_vfio_noiommu_is_enabled().
>> +static inline bool
>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> +{
>
> This is not in fast path, there is no reason it should be inline
>
Ok, I will remove inlining in v2. I added it for consistency with the
other functions declared above.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
2018-01-08 15:38 ` Stephen Hemminger
@ 2018-01-08 15:54 ` Maxime Coquelin
2018-01-08 16:48 ` Maxime Coquelin
0 siblings, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2018-01-08 15:54 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov,
thomas, peterx
On 01/08/2018 04:38 PM, Stephen Hemminger wrote:
> On Mon, 8 Jan 2018 14:51:27 +0100
> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
>
>> +static inline bool
>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> +{
>> +#if defined(RTE_ARCH_PPC_64)
>> + return false;
>> +#elif defined(RTE_ARCH_X86)
>> +
>
> The cleaner way to handle this kind of ifdef is:
>
> #ifdef RTE_ARCH_X86
> static bool
> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> {
> ....
> }
> #elif defined(RTE_ARCH_PPC_64)
> static inline bool
> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
> {
> return false;
> }
> #endif
Ok, thanks. I do this in v2.
> What about AMD64?
I haven't checked AMD64 spec yet.
> Do all ARM processors have IOMMU, I think not.
No, not all have an IOMMU, and I don't know if those which have one have
such limitations.
But if they don't, they cannot use VFIO without noiommu enabled.
This patch only change behavior for Intel, and could be extended to
other HW if needed.
Regards,
Maxime
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small
2018-01-08 15:54 ` Maxime Coquelin
@ 2018-01-08 16:48 ` Maxime Coquelin
0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-01-08 16:48 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, stable, jianfeng.tan, santosh.shukla, anatoly.burakov,
thomas, peterx
On 01/08/2018 04:54 PM, Maxime Coquelin wrote:
>
>
> On 01/08/2018 04:38 PM, Stephen Hemminger wrote:
>> On Mon, 8 Jan 2018 14:51:27 +0100
>> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
>>
>>> +static inline bool
>>> +pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>>> +{
>>> +#if defined(RTE_ARCH_PPC_64)
>>> + return false;
>>> +#elif defined(RTE_ARCH_X86)
>>> +
>>
>> The cleaner way to handle this kind of ifdef is:
>>
>> #ifdef RTE_ARCH_X86
>> static bool
>> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> {
>> ....
>> }
>> #elif defined(RTE_ARCH_PPC_64)
>> static inline bool
>> pci_one_device_iommu_support_va(struct rte_pci_device *dev)
>> {
>> return false;
>> }
>> #endif
>
> Ok, thanks. I do this in v2.
>
>> What about AMD64?
>
> I haven't checked AMD64 spec yet.
From AMD IOMMU spec (see [0], page 178), the only supported Guest
Virtual Address size is 48bits, so above the 47 bits of user VA on x86.
So in this regard, AMD IOMMU is compatible with using VA as IOVA.
Cheers,
Maxime
[0]: https://support.amd.com/TechDocs/48882_IOMMU.pdf
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-01-08 16:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-08 13:51 [dpdk-dev] [PATCH] bus/pci: forbid VA as IOVA mode if IOMMU address width too small Maxime Coquelin
2018-01-08 15:34 ` Stephen Hemminger
2018-01-08 15:48 ` Maxime Coquelin
2018-01-08 15:38 ` Stephen Hemminger
2018-01-08 15:54 ` Maxime Coquelin
2018-01-08 16:48 ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).