DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
@ 2019-07-24 16:46 Anatoly Burakov
  2019-07-25  8:05 ` David Marchand
  2019-07-25  9:52 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
  0 siblings, 2 replies; 17+ messages in thread
From: Anatoly Burakov @ 2019-07-24 16:46 UTC (permalink / raw)
  To: dev; +Cc: david.marchand, jerinj, thomas

When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

We also assume that VFIO equals IOMMU, so if VFIO support is not
compiled, we always assume IOMMU support is not available.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
 lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
 lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..584f97a96 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			/* if we have an IOMMU, pick IOVA as VA mode */
+			if (vfio_iommu_enabled()) {
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
+			} else {
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
+				RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..6d5ca7903 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2018 Intel Corporation
  */
 
+#include <dirent.h>
 #include <inttypes.h>
 #include <string.h>
 #include <fcntl.h>
@@ -23,6 +24,8 @@
 
 #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
 
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 /* hot plug/unplug of VFIO groups may cause all DMA maps to be dropped. we can
  * recreate the mappings for DPDK segments, but we cannot do so for memory that
  * was registered by the user themselves, so we need to store the user mappings
@@ -2026,6 +2029,33 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
 	return container_dma_unmap(vfio_cfg, vaddr, iova, len);
 }
 
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
+
 #else
 
 int
@@ -2146,4 +2176,13 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
 	return -1;
 }
 
+/*
+ * VFIO not compiled, so IOMMU unsupported.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	return 0;
+}
+
 #endif /* VFIO_PRESENT */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
index cb2d35fb1..58c7a7309 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.h
+++ b/lib/librte_eal/linux/eal/eal_vfio.h
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_iommu_enabled(void);
+
 #define EAL_VFIO_MP "eal_vfio_mp_sync"
 
 #define SOCKET_REQ_CONTAINER 0x100
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-24 16:46 [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available Anatoly Burakov
@ 2019-07-25  8:05 ` David Marchand
  2019-07-25  9:31   ` Burakov, Anatoly
  2019-07-25 18:58   ` Thomas Monjalon
  2019-07-25  9:52 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
  1 sibling, 2 replies; 17+ messages in thread
From: David Marchand @ 2019-07-25  8:05 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Jerin Jacob Kollanukkaran, Thomas Monjalon

On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> We also assume that VFIO equals IOMMU, so if VFIO support is not
> compiled, we always assume IOMMU support is not available.

Not sure I agree with this statement.
What about unknown (from eal pov) kernel drivers?


>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
>  lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
>  lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
>  3 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..584f97a96 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       /* if we have an IOMMU, pick IOVA as VA mode */
> +                       if (vfio_iommu_enabled()) {
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
> +                               RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
> +                       }

Here, since the buses don't care, we can check for physical address
availability.


-- 
David Marchand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  8:05 ` David Marchand
@ 2019-07-25  9:31   ` Burakov, Anatoly
  2019-07-25  9:35     ` David Marchand
  2019-07-25 18:58   ` Thomas Monjalon
  1 sibling, 1 reply; 17+ messages in thread
From: Burakov, Anatoly @ 2019-07-25  9:31 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Jerin Jacob Kollanukkaran, Thomas Monjalon

On 25-Jul-19 9:05 AM, David Marchand wrote:
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>> populated. This is happening since at least 3.6 when VFIO support
>> was added. If the directory is empty, EAL should not pick IOVA as
>> VA as the default IOVA mode.
>>
>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>> compiled, we always assume IOMMU support is not available.
> 
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?

Are there any cases where we can use IOVA as VA mode without having VFIO 
compiled?

> 
> 
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   lib/librte_eal/linux/eal/eal.c      | 11 ++++++--
>>   lib/librte_eal/linux/eal/eal_vfio.c | 39 +++++++++++++++++++++++++++++
>>   lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
>>   3 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
>> index 34db78753..584f97a96 100644
>> --- a/lib/librte_eal/linux/eal/eal.c
>> +++ b/lib/librte_eal/linux/eal/eal.c
>> @@ -1061,8 +1061,15 @@ rte_eal_init(int argc, char **argv)
>>                  enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>>
>>                  if (iova_mode == RTE_IOVA_DC) {
>> -                       iova_mode = RTE_IOVA_VA;
>> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
>> +                       /* if we have an IOMMU, pick IOVA as VA mode */
>> +                       if (vfio_iommu_enabled()) {
>> +                               iova_mode = RTE_IOVA_VA;
>> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, selecting IOVA as VA mode.\n");
>> +                       } else {
>> +                               iova_mode = RTE_IOVA_PA;
>> +                               RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, but IOMMU is not available.\n");
>> +                               RTE_LOG(DEBUG, EAL, "Selecting IOVA as PA mode.\n");
>> +                       }
> 
> Here, since the buses don't care, we can check for physical address
> availability.
> 

Good point, if PA are not available, we can't use IOVA as PA mode.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  9:31   ` Burakov, Anatoly
@ 2019-07-25  9:35     ` David Marchand
  2019-07-25  9:38       ` Burakov, Anatoly
  0 siblings, 1 reply; 17+ messages in thread
From: David Marchand @ 2019-07-25  9:35 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, Jerin Jacob Kollanukkaran, Thomas Monjalon

On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 25-Jul-19 9:05 AM, David Marchand wrote:
> > On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> > <anatoly.burakov@intel.com> wrote:
> >>
> >> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> >> populated. This is happening since at least 3.6 when VFIO support
> >> was added. If the directory is empty, EAL should not pick IOVA as
> >> VA as the default IOVA mode.
> >>
> >> We also assume that VFIO equals IOMMU, so if VFIO support is not
> >> compiled, we always assume IOMMU support is not available.
> >
> > Not sure I agree with this statement.
> > What about unknown (from eal pov) kernel drivers?
>
> Are there any cases where we can use IOVA as VA mode without having VFIO
> compiled?

If a pmd relies on a kernel driver we don't know in EAL.
This is not the case afaik, but I'd prefer we don't mix vfio and iommu.


-- 
David Marchand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  9:35     ` David Marchand
@ 2019-07-25  9:38       ` Burakov, Anatoly
  2019-07-25  9:40         ` Burakov, Anatoly
  0 siblings, 1 reply; 17+ messages in thread
From: Burakov, Anatoly @ 2019-07-25  9:38 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Jerin Jacob Kollanukkaran, Thomas Monjalon

On 25-Jul-19 10:35 AM, David Marchand wrote:
> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>
>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>> <anatoly.burakov@intel.com> wrote:
>>>>
>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>> populated. This is happening since at least 3.6 when VFIO support
>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>> VA as the default IOVA mode.
>>>>
>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>> compiled, we always assume IOMMU support is not available.
>>>
>>> Not sure I agree with this statement.
>>> What about unknown (from eal pov) kernel drivers?
>>
>> Are there any cases where we can use IOVA as VA mode without having VFIO
>> compiled?
> 
> If a pmd relies on a kernel driver we don't know in EAL.
> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
> 

OK, i can drop that.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  9:38       ` Burakov, Anatoly
@ 2019-07-25  9:40         ` Burakov, Anatoly
  0 siblings, 0 replies; 17+ messages in thread
From: Burakov, Anatoly @ 2019-07-25  9:40 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Jerin Jacob Kollanukkaran, Thomas Monjalon

On 25-Jul-19 10:38 AM, Burakov, Anatoly wrote:
> On 25-Jul-19 10:35 AM, David Marchand wrote:
>> On Thu, Jul 25, 2019 at 11:31 AM Burakov, Anatoly
>> <anatoly.burakov@intel.com> wrote:
>>>
>>> On 25-Jul-19 9:05 AM, David Marchand wrote:
>>>> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
>>>> <anatoly.burakov@intel.com> wrote:
>>>>>
>>>>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>>>>> populated. This is happening since at least 3.6 when VFIO support
>>>>> was added. If the directory is empty, EAL should not pick IOVA as
>>>>> VA as the default IOVA mode.
>>>>>
>>>>> We also assume that VFIO equals IOMMU, so if VFIO support is not
>>>>> compiled, we always assume IOMMU support is not available.
>>>>
>>>> Not sure I agree with this statement.
>>>> What about unknown (from eal pov) kernel drivers?
>>>
>>> Are there any cases where we can use IOVA as VA mode without having VFIO
>>> compiled?
>>
>> If a pmd relies on a kernel driver we don't know in EAL.
>> This is not the case afaik, but I'd prefer we don't mix vfio and iommu.
>>
> 
> OK, i can drop that.
> 

By the way, would kernel report IOMMU groups in that case? As in, would 
/sys/kernel/iommu_groups be populated?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v2] eal: pick IOVA as PA if IOMMU is not available
  2019-07-24 16:46 [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available Anatoly Burakov
  2019-07-25  8:05 ` David Marchand
@ 2019-07-25  9:52 ` Anatoly Burakov
  2019-07-25  9:56   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  2019-07-25 11:05   ` [dpdk-dev] [PATCH v3] " Anatoly Burakov
  1 sibling, 2 replies; 17+ messages in thread
From: Anatoly Burakov @ 2019-07-25  9:52 UTC (permalink / raw)
  To: dev; +Cc: david.marchand, jerinj, thomas

When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Decouple IOMMU from VFIO
    - Add a check for physical addresses availability

 lib/librte_eal/linux/eal/eal.c      | 21 ++++++++++++++++++--
 lib/librte_eal/linux/eal/eal_vfio.c | 30 +++++++++++++++++++++++++++++
 lib/librte_eal/linux/eal/eal_vfio.h |  2 ++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..207ee0b1c 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+			if (!phys_addrs) {
+				/* if we have no access to physical addreses,
+				 * pick IOVA as VA mode.
+				 */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+			} else if (vfio_iommu_enabled()) {
+				/* we have an IOMMU, pick IOVA as VA mode */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+			} else {
+				/* physical addresses available, and no IOMMU
+				 * found, so pick IOVA as PA.
+				 */
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..92d290284 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2018 Intel Corporation
  */
 
+#include <dirent.h>
 #include <inttypes.h>
 #include <string.h>
 #include <fcntl.h>
@@ -19,6 +20,8 @@
 #include "eal_vfio.h"
 #include "eal_private.h"
 
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 #ifdef VFIO_PRESENT
 
 #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
@@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
 }
 
 #endif /* VFIO_PRESENT */
+
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
index cb2d35fb1..58c7a7309 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.h
+++ b/lib/librte_eal/linux/eal/eal_vfio.h
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_iommu_enabled(void);
+
 #define EAL_VFIO_MP "eal_vfio_mp_sync"
 
 #define SOCKET_REQ_CONTAINER 0x100
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH v2] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  9:52 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
@ 2019-07-25  9:56   ` Jerin Jacob Kollanukkaran
  2019-07-25 11:05   ` [dpdk-dev] [PATCH v3] " Anatoly Burakov
  1 sibling, 0 replies; 17+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-25  9:56 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: david.marchand, thomas


> -----Original Message-----
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> Sent: Thursday, July 25, 2019 3:22 PM
> To: dev@dpdk.org
> Cc: david.marchand@redhat.com; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; thomas@monjalon.net
> Subject: [EXT] [PATCH v2] eal: pick IOVA as PA if IOMMU is not available
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support was added. If
> the directory is empty, EAL should not pick IOVA as VA as the default IOVA
> mode.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     v2:
>     - Decouple IOMMU from VFIO
>     - Add a check for physical addresses availability
> 
>  lib/librte_eal/linux/eal/eal.c      | 21 ++++++++++++++++++--
>  lib/librte_eal/linux/eal/eal_vfio.c | 30 +++++++++++++++++++++++++++++
> lib/librte_eal/linux/eal/eal_vfio.h |  2 ++

Please update the documentation as well.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v3] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  9:52 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
  2019-07-25  9:56   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
@ 2019-07-25 11:05   ` Anatoly Burakov
  2019-07-26  5:08     ` Stojaczyk, Dariusz
  2019-07-26 15:37     ` [dpdk-dev] [PATCH v4] " Anatoly Burakov
  1 sibling, 2 replies; 17+ messages in thread
From: Anatoly Burakov @ 2019-07-25 11:05 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, dariusz.stojaczyk, thomas,
	david.marchand, jerinj

When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v3:
    - Add documentation changes
    - Fix a typo pointed out by checkpatch
    
    v2:
    - Decouple IOMMU from VFIO
    - Add a check for physical addresses availability

 .../prog_guide/env_abstraction_layer.rst      | 27 +++++++++++------
 doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++++++++
 doc/guides/rel_notes/release_19_08.rst        | 16 ++++++++++
 lib/librte_eal/linux/eal/eal.c                | 21 +++++++++++--
 lib/librte_eal/linux/eal/eal_vfio.c           | 30 +++++++++++++++++++
 lib/librte_eal/linux/eal/eal_vfio.h           |  2 ++
 6 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1487ea550..e6e70e5a8 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -425,6 +425,9 @@ IOVA Mode Detection
 IOVA Mode is selected by considering what the current usable Devices on the
 system require and/or support.
 
+On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used.
+On Linux, the IOVA mode is detected based on a heuristic.
+
 Below is the 2-step heuristic for this choice.
 
 For the first step, EAL asks each bus its requirement in terms of IOVA mode
@@ -438,20 +441,26 @@ and decides on a preferred IOVA mode.
   RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
   check on Physical Addresses availability),
 
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
 The second step checks if the preferred mode complies with the Physical
 Addresses availability since those are only available to root user in recent
-kernels.
-
-- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
-  Addresses, then EAL init fails early, since later probing of the devices
-  would fail anyway,
-- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
-  In the case when the buses had disagreed on the IOVA Mode at the first step,
-  part of the buses won't work because of this decision.
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
 
 .. note::
 
-    The RTE_IOVA_VA mode is selected as the default for the following reasons:
+    The RTE_IOVA_VA mode is preferred as the default in most cases for the
+    following reasons:
 
     - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
       physical address availability.
diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 276327c15..0b50c8306 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -861,3 +861,29 @@ AVX-512 support disabled
 
 **Driver/Module**:
     ALL.
+
+
+Unsuitable IOVA mode may be picked as the default
+----------------------------------------------------------------
+**Description**
+   Not all kernel drivers and not all devices support all IOVA modes. EAL will
+   attempt to pick a reasonable default based on a number of factors, but there
+   may be cases where the default may be unsuitable (for example, hotplugging
+   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
+   initialization).
+
+**Implication**
+   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
+   mode being automatically picked by EAL.
+
+**Resolution/Workaround**:
+   It is possible to force EAL to pick a particular IOVA mode by using the
+   `--iova-mode` command-line parameter. If conflicting requirements are present
+   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
+   there is no workaround.
+
+**Affected Environment/Platform**:
+   Linux.
+
+**Driver/Module**:
+   ALL.
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index c9bd3ce18..28af90985 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -56,6 +56,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **EAL will now pick IOVA as VA mode as the default in most cases.**
+
+  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
+  behavior has now been changed to handle IOVA mode detection in a more complex
+  manner, and will default to IOVA as VA in most cases.
+
 * **Added MCS lock.**
 
   MCS lock provides scalability by spinning on a CPU/thread local variable
@@ -436,6 +442,16 @@ Known Issues
    =========================================================
 
 
+   * **Unsuitable IOVA mode may be picked as the default**
+
+     Not all kernel drivers and not all devices support all IOVA modes. EAL will
+     attempt to pick a reasonable default based on a number of factors, but
+     there may be cases where the default may be unsuitable.
+
+     It is recommended to use the `--iova-mode` command-line parameter if the
+     default is not suitable.
+
+
 Tested Platforms
 ----------------
 
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..29972b896 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+			if (!phys_addrs) {
+				/* if we have no access to physical addresses,
+				 * pick IOVA as VA mode.
+				 */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+			} else if (vfio_iommu_enabled()) {
+				/* we have an IOMMU, pick IOVA as VA mode */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+			} else {
+				/* physical addresses available, and no IOMMU
+				 * found, so pick IOVA as PA.
+				 */
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..92d290284 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2018 Intel Corporation
  */
 
+#include <dirent.h>
 #include <inttypes.h>
 #include <string.h>
 #include <fcntl.h>
@@ -19,6 +20,8 @@
 #include "eal_vfio.h"
 #include "eal_private.h"
 
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 #ifdef VFIO_PRESENT
 
 #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
@@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
 }
 
 #endif /* VFIO_PRESENT */
+
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
index cb2d35fb1..58c7a7309 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.h
+++ b/lib/librte_eal/linux/eal/eal_vfio.h
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_iommu_enabled(void);
+
 #define EAL_VFIO_MP "eal_vfio_mp_sync"
 
 #define SOCKET_REQ_CONTAINER 0x100
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25  8:05 ` David Marchand
  2019-07-25  9:31   ` Burakov, Anatoly
@ 2019-07-25 18:58   ` Thomas Monjalon
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2019-07-25 18:58 UTC (permalink / raw)
  To: David Marchand, Anatoly Burakov; +Cc: dev, Jerin Jacob Kollanukkaran

Jeu 25 juil 2019, à 10:05, David Marchand a écrit :
> On Wed, Jul 24, 2019 at 6:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
> >
> > When IOMMU is not available, /sys/kernel/iommu_groups will not be
> > populated. This is happening since at least 3.6 when VFIO support
> > was added. If the directory is empty, EAL should not pick IOVA as
> > VA as the default IOVA mode.
> >
> > We also assume that VFIO equals IOMMU, so if VFIO support is not
> > compiled, we always assume IOMMU support is not available.
> 
> Not sure I agree with this statement.
> What about unknown (from eal pov) kernel drivers?

Exactly, this is the case of Mellanox drivers. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v3] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25 11:05   ` [dpdk-dev] [PATCH v3] " Anatoly Burakov
@ 2019-07-26  5:08     ` Stojaczyk, Dariusz
  2019-07-26 15:37     ` [dpdk-dev] [PATCH v4] " Anatoly Burakov
  1 sibling, 0 replies; 17+ messages in thread
From: Stojaczyk, Dariusz @ 2019-07-26  5:08 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Mcnamara, John, Kovacevic, Marko, thomas, david.marchand, jerinj

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, July 25, 2019 1:06 PM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; Stojaczyk, Dariusz
> <dariusz.stojaczyk@intel.com>; thomas@monjalon.net;
> david.marchand@redhat.com; jerinj@marvell.com
> Subject: [PATCH v3] eal: pick IOVA as PA if IOMMU is not available
> 
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v4] eal: pick IOVA as PA if IOMMU is not available
  2019-07-25 11:05   ` [dpdk-dev] [PATCH v3] " Anatoly Burakov
  2019-07-26  5:08     ` Stojaczyk, Dariusz
@ 2019-07-26 15:37     ` Anatoly Burakov
  2019-07-29  9:31       ` David Marchand
  2019-07-29 13:52       ` [dpdk-dev] [PATCH v5] " Anatoly Burakov
  1 sibling, 2 replies; 17+ messages in thread
From: Anatoly Burakov @ 2019-07-26 15:37 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, dariusz.stojaczyk, thomas,
	david.marchand, jerinj

When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
---

Notes:
    v4:
    - Fix indentation in release notes' known issues
    
    v3:
    - Add documentation changes
    - Fix a typo pointed out by checkpatch
    
    v2:
    - Decouple IOMMU from VFIO
    - Add a check for physical addresses availability

 .../prog_guide/env_abstraction_layer.rst      | 27 +++++++++++------
 doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++++++++
 doc/guides/rel_notes/release_19_08.rst        | 16 ++++++++++
 lib/librte_eal/linux/eal/eal.c                | 21 +++++++++++--
 lib/librte_eal/linux/eal/eal_vfio.c           | 30 +++++++++++++++++++
 lib/librte_eal/linux/eal/eal_vfio.h           |  2 ++
 6 files changed, 111 insertions(+), 11 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1487ea550..e6e70e5a8 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -425,6 +425,9 @@ IOVA Mode Detection
 IOVA Mode is selected by considering what the current usable Devices on the
 system require and/or support.
 
+On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used.
+On Linux, the IOVA mode is detected based on a heuristic.
+
 Below is the 2-step heuristic for this choice.
 
 For the first step, EAL asks each bus its requirement in terms of IOVA mode
@@ -438,20 +441,26 @@ and decides on a preferred IOVA mode.
   RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
   check on Physical Addresses availability),
 
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
 The second step checks if the preferred mode complies with the Physical
 Addresses availability since those are only available to root user in recent
-kernels.
-
-- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
-  Addresses, then EAL init fails early, since later probing of the devices
-  would fail anyway,
-- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
-  In the case when the buses had disagreed on the IOVA Mode at the first step,
-  part of the buses won't work because of this decision.
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
 
 .. note::
 
-    The RTE_IOVA_VA mode is selected as the default for the following reasons:
+    The RTE_IOVA_VA mode is preferred as the default in most cases for the
+    following reasons:
 
     - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
       physical address availability.
diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 276327c15..0b50c8306 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -861,3 +861,29 @@ AVX-512 support disabled
 
 **Driver/Module**:
     ALL.
+
+
+Unsuitable IOVA mode may be picked as the default
+----------------------------------------------------------------
+**Description**
+   Not all kernel drivers and not all devices support all IOVA modes. EAL will
+   attempt to pick a reasonable default based on a number of factors, but there
+   may be cases where the default may be unsuitable (for example, hotplugging
+   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
+   initialization).
+
+**Implication**
+   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
+   mode being automatically picked by EAL.
+
+**Resolution/Workaround**:
+   It is possible to force EAL to pick a particular IOVA mode by using the
+   `--iova-mode` command-line parameter. If conflicting requirements are present
+   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
+   there is no workaround.
+
+**Affected Environment/Platform**:
+   Linux.
+
+**Driver/Module**:
+   ALL.
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index c9bd3ce18..b399ca536 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -56,6 +56,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **EAL will now pick IOVA as VA mode as the default in most cases.**
+
+  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
+  behavior has now been changed to handle IOVA mode detection in a more complex
+  manner, and will default to IOVA as VA in most cases.
+
 * **Added MCS lock.**
 
   MCS lock provides scalability by spinning on a CPU/thread local variable
@@ -436,6 +442,16 @@ Known Issues
    =========================================================
 
 
+* **Unsuitable IOVA mode may be picked as the default**
+
+  Not all kernel drivers and not all devices support all IOVA modes. EAL will
+  attempt to pick a reasonable default based on a number of factors, but
+  there may be cases where the default may be unsuitable.
+
+  It is recommended to use the `--iova-mode` command-line parameter if the
+  default is not suitable.
+
+
 Tested Platforms
 ----------------
 
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..29972b896 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+			if (!phys_addrs) {
+				/* if we have no access to physical addresses,
+				 * pick IOVA as VA mode.
+				 */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+			} else if (vfio_iommu_enabled()) {
+				/* we have an IOMMU, pick IOVA as VA mode */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+			} else {
+				/* physical addresses available, and no IOMMU
+				 * found, so pick IOVA as PA.
+				 */
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
index 501c74f23..92d290284 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.c
+++ b/lib/librte_eal/linux/eal/eal_vfio.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2018 Intel Corporation
  */
 
+#include <dirent.h>
 #include <inttypes.h>
 #include <string.h>
 #include <fcntl.h>
@@ -19,6 +20,8 @@
 #include "eal_vfio.h"
 #include "eal_private.h"
 
+#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 #ifdef VFIO_PRESENT
 
 #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
@@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
 }
 
 #endif /* VFIO_PRESENT */
+
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+int
+vfio_iommu_enabled(void)
+{
+	DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return 0;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
index cb2d35fb1..58c7a7309 100644
--- a/lib/librte_eal/linux/eal/eal_vfio.h
+++ b/lib/librte_eal/linux/eal/eal_vfio.h
@@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
 
 int vfio_mp_sync_setup(void);
 
+int vfio_iommu_enabled(void);
+
 #define EAL_VFIO_MP "eal_vfio_mp_sync"
 
 #define SOCKET_REQ_CONTAINER 0x100
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4] eal: pick IOVA as PA if IOMMU is not available
  2019-07-26 15:37     ` [dpdk-dev] [PATCH v4] " Anatoly Burakov
@ 2019-07-29  9:31       ` David Marchand
  2019-07-29 11:18         ` Burakov, Anatoly
  2019-07-29 13:52       ` [dpdk-dev] [PATCH v5] " Anatoly Burakov
  1 sibling, 1 reply; 17+ messages in thread
From: David Marchand @ 2019-07-29  9:31 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, John McNamara, Marko Kovacevic, dariusz.stojaczyk,
	Thomas Monjalon, Jerin Jacob Kollanukkaran

On Fri, Jul 26, 2019 at 5:37 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> Tested-by: Jerin Jacob <jerinj@marvell.com>
> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> ---
>
> Notes:
>     v4:
>     - Fix indentation in release notes' known issues
>
>     v3:
>     - Add documentation changes
>     - Fix a typo pointed out by checkpatch
>
>     v2:
>     - Decouple IOMMU from VFIO
>     - Add a check for physical addresses availability
>
>  .../prog_guide/env_abstraction_layer.rst      | 27 +++++++++++------
>  doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++++++++
>  doc/guides/rel_notes/release_19_08.rst        | 16 ++++++++++
>  lib/librte_eal/linux/eal/eal.c                | 21 +++++++++++--
>  lib/librte_eal/linux/eal/eal_vfio.c           | 30 +++++++++++++++++++
>  lib/librte_eal/linux/eal/eal_vfio.h           |  2 ++
>  6 files changed, 111 insertions(+), 11 deletions(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 1487ea550..e6e70e5a8 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -425,6 +425,9 @@ IOVA Mode Detection
>  IOVA Mode is selected by considering what the current usable Devices on the
>  system require and/or support.
>
> +On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used.

We still allow setting it via --iova-mode=
Is it really unsupported ? vdev like rings could work.


> +On Linux, the IOVA mode is detected based on a heuristic.
> +
>  Below is the 2-step heuristic for this choice.

We can combine those two sentences as a single one.


>
>  For the first step, EAL asks each bus its requirement in terms of IOVA mode
> @@ -438,20 +441,26 @@ and decides on a preferred IOVA mode.
>    RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>    check on Physical Addresses availability),
>
> +If the buses have expressed no preference on which IOVA mode to pick, then a
> +default is selected using the following logic:
> +
> +- if physical addresses are not available, RTE_IOVA_VA mode is used
> +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
> +- otherwise, RTE_IOVA_PA mode is used
> +
> +In the case when the buses had disagreed on their preferred IOVA mode, part of
> +the buses won't work because of this decision.
> +
>  The second step checks if the preferred mode complies with the Physical
>  Addresses availability since those are only available to root user in recent
> -kernels.
> -
> -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> -  Addresses, then EAL init fails early, since later probing of the devices
> -  would fail anyway,
> -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
> -  In the case when the buses had disagreed on the IOVA Mode at the first step,
> -  part of the buses won't work because of this decision.
> +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
> +Physical Addresses, then EAL init fails early, since later probing of the
> +devices would fail anyway.
>
>  .. note::
>
> -    The RTE_IOVA_VA mode is selected as the default for the following reasons:
> +    The RTE_IOVA_VA mode is preferred as the default in most cases for the
> +    following reasons:
>
>      - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>        physical address availability.
> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
> index 276327c15..0b50c8306 100644
> --- a/doc/guides/rel_notes/known_issues.rst
> +++ b/doc/guides/rel_notes/known_issues.rst
> @@ -861,3 +861,29 @@ AVX-512 support disabled
>
>  **Driver/Module**:
>      ALL.
> +
> +
> +Unsuitable IOVA mode may be picked as the default
> +----------------------------------------------------------------
> +**Description**
> +   Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +   attempt to pick a reasonable default based on a number of factors, but there
> +   may be cases where the default may be unsuitable (for example, hotplugging
> +   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
> +   initialization).
> +
> +**Implication**
> +   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
> +   mode being automatically picked by EAL.
> +
> +**Resolution/Workaround**:
> +   It is possible to force EAL to pick a particular IOVA mode by using the
> +   `--iova-mode` command-line parameter. If conflicting requirements are present
> +   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
> +   there is no workaround.
> +
> +**Affected Environment/Platform**:
> +   Linux.
> +
> +**Driver/Module**:
> +   ALL.
> diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
> index c9bd3ce18..b399ca536 100644
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -56,6 +56,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>
> +* **EAL will now pick IOVA as VA mode as the default in most cases.**
> +
> +  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
> +  behavior has now been changed to handle IOVA mode detection in a more complex
> +  manner, and will default to IOVA as VA in most cases.
> +
>  * **Added MCS lock.**
>
>    MCS lock provides scalability by spinning on a CPU/thread local variable
> @@ -436,6 +442,16 @@ Known Issues
>     =========================================================
>
>
> +* **Unsuitable IOVA mode may be picked as the default**
> +
> +  Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +  attempt to pick a reasonable default based on a number of factors, but
> +  there may be cases where the default may be unsuitable.
> +
> +  It is recommended to use the `--iova-mode` command-line parameter if the
> +  default is not suitable.
> +
> +
>  Tested Platforms
>  ----------------
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..29972b896 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
> +
> +                       if (!phys_addrs) {
> +                               /* if we have no access to physical addresses,
> +                                * pick IOVA as VA mode.
> +                                */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +                       } else if (vfio_iommu_enabled()) {

How about:
s/vfio_iommu_enabled/is_iommu_available/

And the code would move from vfio specific files to eal.c.


> +                               /* we have an IOMMU, pick IOVA as VA mode */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               /* physical addresses available, and no IOMMU
> +                                * found, so pick IOVA as PA.
> +                                */
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
> +                       }
>                 }
>  #ifdef RTE_LIBRTE_KNI
>                 /* Workaround for KNI which requires physical address to work */
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c
> index 501c74f23..92d290284 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.c
> +++ b/lib/librte_eal/linux/eal/eal_vfio.c
> @@ -2,6 +2,7 @@
>   * Copyright(c) 2010-2018 Intel Corporation
>   */
>
> +#include <dirent.h>
>  #include <inttypes.h>
>  #include <string.h>
>  #include <fcntl.h>
> @@ -19,6 +20,8 @@
>  #include "eal_vfio.h"
>  #include "eal_private.h"
>
> +#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
> +
>  #ifdef VFIO_PRESENT
>
>  #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb"
> @@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd,
>  }
>
>  #endif /* VFIO_PRESENT */
> +
> +/*
> + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
> + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
> + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
> + * checking if the path is empty will tell us if IOMMU is enabled.
> + */
> +int
> +vfio_iommu_enabled(void)
> +{
> +       DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH);
> +       struct dirent *d;
> +       int n = 0;
> +
> +       /* if directory doesn't exist, assume IOMMU is not enabled */
> +       if (dir == NULL)
> +               return 0;
> +
> +       while ((d = readdir(dir)) != NULL) {
> +               /* skip dot and dot-dot */
> +               if (++n > 2)
> +                       break;
> +       }
> +       closedir(dir);
> +
> +       return n > 2;
> +}
> diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h
> index cb2d35fb1..58c7a7309 100644
> --- a/lib/librte_eal/linux/eal/eal_vfio.h
> +++ b/lib/librte_eal/linux/eal/eal_vfio.h
> @@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd);
>
>  int vfio_mp_sync_setup(void);
>
> +int vfio_iommu_enabled(void);
> +
>  #define EAL_VFIO_MP "eal_vfio_mp_sync"
>
>  #define SOCKET_REQ_CONTAINER 0x100
> --
> 2.17.1



-- 
David Marchand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v4] eal: pick IOVA as PA if IOMMU is not available
  2019-07-29  9:31       ` David Marchand
@ 2019-07-29 11:18         ` Burakov, Anatoly
  0 siblings, 0 replies; 17+ messages in thread
From: Burakov, Anatoly @ 2019-07-29 11:18 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, John McNamara, Marko Kovacevic, dariusz.stojaczyk,
	Thomas Monjalon, Jerin Jacob Kollanukkaran

On 29-Jul-19 10:31 AM, David Marchand wrote:
> On Fri, Jul 26, 2019 at 5:37 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> When IOMMU is not available, /sys/kernel/iommu_groups will not be
>> populated. This is happening since at least 3.6 when VFIO support
>> was added. If the directory is empty, EAL should not pick IOVA as
>> VA as the default IOVA mode.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
>> Tested-by: Jerin Jacob <jerinj@marvell.com>
>> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Fix indentation in release notes' known issues
>>
>>      v3:
>>      - Add documentation changes
>>      - Fix a typo pointed out by checkpatch
>>
>>      v2:
>>      - Decouple IOMMU from VFIO
>>      - Add a check for physical addresses availability
>>
>>   .../prog_guide/env_abstraction_layer.rst      | 27 +++++++++++------
>>   doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++++++++
>>   doc/guides/rel_notes/release_19_08.rst        | 16 ++++++++++
>>   lib/librte_eal/linux/eal/eal.c                | 21 +++++++++++--
>>   lib/librte_eal/linux/eal/eal_vfio.c           | 30 +++++++++++++++++++
>>   lib/librte_eal/linux/eal/eal_vfio.h           |  2 ++
>>   6 files changed, 111 insertions(+), 11 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
>> index 1487ea550..e6e70e5a8 100644
>> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
>> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
>> @@ -425,6 +425,9 @@ IOVA Mode Detection
>>   IOVA Mode is selected by considering what the current usable Devices on the
>>   system require and/or support.
>>
>> +On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used.
> 
> We still allow setting it via --iova-mode=
> Is it really unsupported ? vdev like rings could work.

Oh, right, we don't *really* support IOVA as VA mode, but you can still 
run in --no-huge mode (or using only virtual devices) and pretend like 
it works ;)

Still, i think FreeBSD should default to PA unless it's not running as root.

> 
> 
>> +On Linux, the IOVA mode is detected based on a heuristic.
>> +
>>   Below is the 2-step heuristic for this choice.
> 
> We can combine those two sentences as a single one.

Sure.

> 
> 
>>
>>   For the first step, EAL asks each bus its requirement in terms of IOVA mode
>> @@ -438,20 +441,26 @@ and decides on a preferred IOVA mode.
>>     RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>>     check on Physical Addresses availability),
>>
>> +If the buses have expressed no preference on which IOVA mode to pick, then a
>> +default is selected using the following logic:
>> +

<snip>

>> @@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv)
>>                  enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>>
>>                  if (iova_mode == RTE_IOVA_DC) {
>> -                       iova_mode = RTE_IOVA_VA;
>> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
>> +                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
>> +
>> +                       if (!phys_addrs) {
>> +                               /* if we have no access to physical addresses,
>> +                                * pick IOVA as VA mode.
>> +                                */
>> +                               iova_mode = RTE_IOVA_VA;
>> +                               RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
>> +                       } else if (vfio_iommu_enabled()) {
> 
> How about:
> s/vfio_iommu_enabled/is_iommu_available/
> 
> And the code would move from vfio specific files to eal.c.

Can do.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-dev] [PATCH v5] eal: pick IOVA as PA if IOMMU is not available
  2019-07-26 15:37     ` [dpdk-dev] [PATCH v4] " Anatoly Burakov
  2019-07-29  9:31       ` David Marchand
@ 2019-07-29 13:52       ` Anatoly Burakov
  2019-07-30  7:21         ` David Marchand
  1 sibling, 1 reply; 17+ messages in thread
From: Anatoly Burakov @ 2019-07-29 13:52 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, dariusz.stojaczyk, thomas,
	david.marchand, jerinj

When IOMMU is not available, /sys/kernel/iommu_groups will not be
populated. This is happening since at least 3.6 when VFIO support
was added. If the directory is empty, EAL should not pick IOVA as
VA as the default IOVA mode.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Tested-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
---

Notes:
    v5:
    - Clarify docs on FreeBSD
    - Move IOMMU detection code out of VFIO sources
    
    v4:
    - Fix indentation in release notes' known issues
    
    v3:
    - Add documentation changes
    - Fix a typo pointed out by checkpatch
    
    v2:
    - Decouple IOMMU from VFIO
    - Add a check for physical addresses availability

 .../prog_guide/env_abstraction_layer.rst      | 27 ++++++----
 doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++
 doc/guides/rel_notes/release_19_08.rst        | 16 ++++++
 lib/librte_eal/linux/eal/eal.c                | 50 ++++++++++++++++++-
 4 files changed, 107 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 1487ea550..94f30fd5d 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -425,7 +425,8 @@ IOVA Mode Detection
 IOVA Mode is selected by considering what the current usable Devices on the
 system require and/or support.
 
-Below is the 2-step heuristic for this choice.
+On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
+detected based on a 2-step heuristic detailed below.
 
 For the first step, EAL asks each bus its requirement in terms of IOVA mode
 and decides on a preferred IOVA mode.
@@ -438,20 +439,26 @@ and decides on a preferred IOVA mode.
   RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
   check on Physical Addresses availability),
 
+If the buses have expressed no preference on which IOVA mode to pick, then a
+default is selected using the following logic:
+
+- if physical addresses are not available, RTE_IOVA_VA mode is used
+- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
+- otherwise, RTE_IOVA_PA mode is used
+
+In the case when the buses had disagreed on their preferred IOVA mode, part of
+the buses won't work because of this decision.
+
 The second step checks if the preferred mode complies with the Physical
 Addresses availability since those are only available to root user in recent
-kernels.
-
-- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
-  Addresses, then EAL init fails early, since later probing of the devices
-  would fail anyway,
-- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
-  In the case when the buses had disagreed on the IOVA Mode at the first step,
-  part of the buses won't work because of this decision.
+kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
+Physical Addresses, then EAL init fails early, since later probing of the
+devices would fail anyway.
 
 .. note::
 
-    The RTE_IOVA_VA mode is selected as the default for the following reasons:
+    The RTE_IOVA_VA mode is preferred as the default in most cases for the
+    following reasons:
 
     - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
       physical address availability.
diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 276327c15..0b50c8306 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -861,3 +861,29 @@ AVX-512 support disabled
 
 **Driver/Module**:
     ALL.
+
+
+Unsuitable IOVA mode may be picked as the default
+----------------------------------------------------------------
+**Description**
+   Not all kernel drivers and not all devices support all IOVA modes. EAL will
+   attempt to pick a reasonable default based on a number of factors, but there
+   may be cases where the default may be unsuitable (for example, hotplugging
+   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
+   initialization).
+
+**Implication**
+   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
+   mode being automatically picked by EAL.
+
+**Resolution/Workaround**:
+   It is possible to force EAL to pick a particular IOVA mode by using the
+   `--iova-mode` command-line parameter. If conflicting requirements are present
+   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
+   there is no workaround.
+
+**Affected Environment/Platform**:
+   Linux.
+
+**Driver/Module**:
+   ALL.
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index c9bd3ce18..b399ca536 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -56,6 +56,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **EAL will now pick IOVA as VA mode as the default in most cases.**
+
+  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
+  behavior has now been changed to handle IOVA mode detection in a more complex
+  manner, and will default to IOVA as VA in most cases.
+
 * **Added MCS lock.**
 
   MCS lock provides scalability by spinning on a CPU/thread local variable
@@ -436,6 +442,16 @@ Known Issues
    =========================================================
 
 
+* **Unsuitable IOVA mode may be picked as the default**
+
+  Not all kernel drivers and not all devices support all IOVA modes. EAL will
+  attempt to pick a reasonable default based on a number of factors, but
+  there may be cases where the default may be unsuitable.
+
+  It is recommended to use the `--iova-mode` command-line parameter if the
+  default is not suitable.
+
+
 Tested Platforms
 ----------------
 
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 34db78753..6ed602c90 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -66,6 +66,8 @@
 
 #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
 
+#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
+
 /* Allow the application to print its usage message too if set */
 static rte_usage_hook_t	rte_application_usage_hook = NULL;
 
@@ -951,6 +953,33 @@ static void rte_eal_init_alert(const char *msg)
 	RTE_LOG(ERR, EAL, "%s\n", msg);
 }
 
+/*
+ * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
+ * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
+ * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
+ * checking if the path is empty will tell us if IOMMU is enabled.
+ */
+static bool
+is_iommu_enabled(void)
+{
+	DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH);
+	struct dirent *d;
+	int n = 0;
+
+	/* if directory doesn't exist, assume IOMMU is not enabled */
+	if (dir == NULL)
+		return false;
+
+	while ((d = readdir(dir)) != NULL) {
+		/* skip dot and dot-dot */
+		if (++n > 2)
+			break;
+	}
+	closedir(dir);
+
+	return n > 2;
+}
+
 /* Launch threads, called at application init(). */
 int
 rte_eal_init(int argc, char **argv)
@@ -1061,8 +1090,25 @@ rte_eal_init(int argc, char **argv)
 		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
 		if (iova_mode == RTE_IOVA_DC) {
-			iova_mode = RTE_IOVA_VA;
-			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
+			RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
+
+			if (!phys_addrs) {
+				/* if we have no access to physical addresses,
+				 * pick IOVA as VA mode.
+				 */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
+			} else if (is_iommu_enabled()) {
+				/* we have an IOMMU, pick IOVA as VA mode */
+				iova_mode = RTE_IOVA_VA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
+			} else {
+				/* physical addresses available, and no IOMMU
+				 * found, so pick IOVA as PA.
+				 */
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
+			}
 		}
 #ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
-- 
2.17.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v5] eal: pick IOVA as PA if IOMMU is not available
  2019-07-29 13:52       ` [dpdk-dev] [PATCH v5] " Anatoly Burakov
@ 2019-07-30  7:21         ` David Marchand
  2019-07-30  8:10           ` Thomas Monjalon
  0 siblings, 1 reply; 17+ messages in thread
From: David Marchand @ 2019-07-30  7:21 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, John McNamara, Marko Kovacevic, dariusz.stojaczyk,
	Thomas Monjalon, Jerin Jacob Kollanukkaran

On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> When IOMMU is not available, /sys/kernel/iommu_groups will not be
> populated. This is happening since at least 3.6 when VFIO support
> was added. If the directory is empty, EAL should not pick IOVA as
> VA as the default IOVA mode.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> Tested-by: Jerin Jacob <jerinj@marvell.com>
> Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> ---
>
> Notes:
>     v5:
>     - Clarify docs on FreeBSD
>     - Move IOMMU detection code out of VFIO sources
>
>     v4:
>     - Fix indentation in release notes' known issues
>
>     v3:
>     - Add documentation changes
>     - Fix a typo pointed out by checkpatch
>
>     v2:
>     - Decouple IOMMU from VFIO
>     - Add a check for physical addresses availability
>
>  .../prog_guide/env_abstraction_layer.rst      | 27 ++++++----
>  doc/guides/rel_notes/known_issues.rst         | 26 ++++++++++
>  doc/guides/rel_notes/release_19_08.rst        | 16 ++++++
>  lib/librte_eal/linux/eal/eal.c                | 50 ++++++++++++++++++-
>  4 files changed, 107 insertions(+), 12 deletions(-)
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 1487ea550..94f30fd5d 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -425,7 +425,8 @@ IOVA Mode Detection
>  IOVA Mode is selected by considering what the current usable Devices on the
>  system require and/or support.
>
> -Below is the 2-step heuristic for this choice.
> +On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is
> +detected based on a 2-step heuristic detailed below.
>
>  For the first step, EAL asks each bus its requirement in terms of IOVA mode
>  and decides on a preferred IOVA mode.
> @@ -438,20 +439,26 @@ and decides on a preferred IOVA mode.
>    RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>    check on Physical Addresses availability),
>
> +If the buses have expressed no preference on which IOVA mode to pick, then a
> +default is selected using the following logic:
> +
> +- if physical addresses are not available, RTE_IOVA_VA mode is used
> +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used
> +- otherwise, RTE_IOVA_PA mode is used
> +
> +In the case when the buses had disagreed on their preferred IOVA mode, part of
> +the buses won't work because of this decision.
> +
>  The second step checks if the preferred mode complies with the Physical
>  Addresses availability since those are only available to root user in recent
> -kernels.
> -
> -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
> -  Addresses, then EAL init fails early, since later probing of the devices
> -  would fail anyway,
> -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode.
> -  In the case when the buses had disagreed on the IOVA Mode at the first step,
> -  part of the buses won't work because of this decision.
> +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to
> +Physical Addresses, then EAL init fails early, since later probing of the
> +devices would fail anyway.
>
>  .. note::
>
> -    The RTE_IOVA_VA mode is selected as the default for the following reasons:
> +    The RTE_IOVA_VA mode is preferred as the default in most cases for the
> +    following reasons:
>
>      - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of
>        physical address availability.
> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
> index 276327c15..0b50c8306 100644
> --- a/doc/guides/rel_notes/known_issues.rst
> +++ b/doc/guides/rel_notes/known_issues.rst
> @@ -861,3 +861,29 @@ AVX-512 support disabled
>
>  **Driver/Module**:
>      ALL.
> +
> +
> +Unsuitable IOVA mode may be picked as the default
> +----------------------------------------------------------------
> +**Description**
> +   Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +   attempt to pick a reasonable default based on a number of factors, but there
> +   may be cases where the default may be unsuitable (for example, hotplugging
> +   devices using `igb_uio` driver while having picked IOVA as VA mode on EAL
> +   initialization).
> +
> +**Implication**
> +   Some devices (hotplugged or otherwise) may not work due to incompatible IOVA
> +   mode being automatically picked by EAL.
> +
> +**Resolution/Workaround**:
> +   It is possible to force EAL to pick a particular IOVA mode by using the
> +   `--iova-mode` command-line parameter. If conflicting requirements are present
> +   (such as one device requiring IOVA as PA and one requiring IOVA as VA mode),
> +   there is no workaround.
> +
> +**Affected Environment/Platform**:
> +   Linux.
> +
> +**Driver/Module**:
> +   ALL.
> diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
> index c9bd3ce18..b399ca536 100644
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -56,6 +56,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =========================================================
>
> +* **EAL will now pick IOVA as VA mode as the default in most cases.**
> +
> +  Previously, preferred default IOVA mode was selected to be IOVA as PA. The
> +  behavior has now been changed to handle IOVA mode detection in a more complex
> +  manner, and will default to IOVA as VA in most cases.
> +
>  * **Added MCS lock.**
>
>    MCS lock provides scalability by spinning on a CPU/thread local variable
> @@ -436,6 +442,16 @@ Known Issues
>     =========================================================
>
>
> +* **Unsuitable IOVA mode may be picked as the default**
> +
> +  Not all kernel drivers and not all devices support all IOVA modes. EAL will
> +  attempt to pick a reasonable default based on a number of factors, but
> +  there may be cases where the default may be unsuitable.
> +
> +  It is recommended to use the `--iova-mode` command-line parameter if the
> +  default is not suitable.
> +
> +
>  Tested Platforms
>  ----------------
>
> diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
> index 34db78753..6ed602c90 100644
> --- a/lib/librte_eal/linux/eal/eal.c
> +++ b/lib/librte_eal/linux/eal/eal.c
> @@ -66,6 +66,8 @@
>
>  #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10)
>
> +#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups"
> +
>  /* Allow the application to print its usage message too if set */
>  static rte_usage_hook_t        rte_application_usage_hook = NULL;
>
> @@ -951,6 +953,33 @@ static void rte_eal_init_alert(const char *msg)
>         RTE_LOG(ERR, EAL, "%s\n", msg);
>  }
>
> +/*
> + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the
> + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel
> + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore,
> + * checking if the path is empty will tell us if IOMMU is enabled.
> + */
> +static bool
> +is_iommu_enabled(void)
> +{
> +       DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH);
> +       struct dirent *d;
> +       int n = 0;
> +
> +       /* if directory doesn't exist, assume IOMMU is not enabled */
> +       if (dir == NULL)
> +               return false;
> +
> +       while ((d = readdir(dir)) != NULL) {
> +               /* skip dot and dot-dot */
> +               if (++n > 2)
> +                       break;
> +       }
> +       closedir(dir);
> +
> +       return n > 2;
> +}
> +
>  /* Launch threads, called at application init(). */
>  int
>  rte_eal_init(int argc, char **argv)
> @@ -1061,8 +1090,25 @@ rte_eal_init(int argc, char **argv)
>                 enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
>
>                 if (iova_mode == RTE_IOVA_DC) {
> -                       iova_mode = RTE_IOVA_VA;
> -                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n");
> +                       RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n");
> +
> +                       if (!phys_addrs) {
> +                               /* if we have no access to physical addresses,
> +                                * pick IOVA as VA mode.
> +                                */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n");
> +                       } else if (is_iommu_enabled()) {
> +                               /* we have an IOMMU, pick IOVA as VA mode */
> +                               iova_mode = RTE_IOVA_VA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n");
> +                       } else {
> +                               /* physical addresses available, and no IOMMU
> +                                * found, so pick IOVA as PA.
> +                                */
> +                               iova_mode = RTE_IOVA_PA;
> +                               RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n");
> +                       }
>                 }
>  #ifdef RTE_LIBRTE_KNI
>                 /* Workaround for KNI which requires physical address to work */
> --
> 2.17.1

Reviewed-by: David Marchand <david.marchand@redhat.com>


-- 
David Marchand

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-dev] [PATCH v5] eal: pick IOVA as PA if IOMMU is not available
  2019-07-30  7:21         ` David Marchand
@ 2019-07-30  8:10           ` Thomas Monjalon
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Monjalon @ 2019-07-30  8:10 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, David Marchand, John McNamara, Marko Kovacevic,
	dariusz.stojaczyk, Jerin Jacob Kollanukkaran

30/07/2019 09:21, David Marchand:
> On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
> >
> > When IOMMU is not available, /sys/kernel/iommu_groups will not be
> > populated. This is happening since at least 3.6 when VFIO support
> > was added. If the directory is empty, EAL should not pick IOVA as
> > VA as the default IOVA mode.
> >
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > Tested-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > Tested-by: Jerin Jacob <jerinj@marvell.com>
> > Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> 
> Reviewed-by: David Marchand <david.marchand@redhat.com>

Applied, with few spaces and minor changes, thanks.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-07-30  8:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24 16:46 [dpdk-dev] [PATCH] eal: pick IOVA as PA if IOMMU is not available Anatoly Burakov
2019-07-25  8:05 ` David Marchand
2019-07-25  9:31   ` Burakov, Anatoly
2019-07-25  9:35     ` David Marchand
2019-07-25  9:38       ` Burakov, Anatoly
2019-07-25  9:40         ` Burakov, Anatoly
2019-07-25 18:58   ` Thomas Monjalon
2019-07-25  9:52 ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
2019-07-25  9:56   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-25 11:05   ` [dpdk-dev] [PATCH v3] " Anatoly Burakov
2019-07-26  5:08     ` Stojaczyk, Dariusz
2019-07-26 15:37     ` [dpdk-dev] [PATCH v4] " Anatoly Burakov
2019-07-29  9:31       ` David Marchand
2019-07-29 11:18         ` Burakov, Anatoly
2019-07-29 13:52       ` [dpdk-dev] [PATCH v5] " Anatoly Burakov
2019-07-30  7:21         ` David Marchand
2019-07-30  8:10           ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).