DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: dev@dpdk.org
Cc: benjamin.walker@intel.com, jerinj@marvell.com,
	anatoly.burakov@intel.com, maxime.coquelin@redhat.com,
	thomas@monjalon.net,
	Bruce Richardson <bruce.richardson@intel.com>
Subject: [dpdk-dev] [PATCH v2 2/3] eal: compute IOVA mode based on PA availability
Date: Fri, 14 Jun 2019 11:39:16 +0200	[thread overview]
Message-ID: <1560505157-9769-3-git-send-email-david.marchand@redhat.com> (raw)
In-Reply-To: <1560505157-9769-1-git-send-email-david.marchand@redhat.com>

From: Ben Walker <benjamin.walker@intel.com>

Currently, if the bus selects IOVA as PA, the memory init can fail when
lacking access to physical addresses.
This can be quite hard for normal users to understand what is wrong
since this is the default behavior.

Catch this situation earlier in eal init by validating physical addresses
availability, or select IOVA when no clear preferrence had been expressed.

The bus code is changed so that it reports when it does not care about
the IOVA mode and let the eal init decide.

In Linux implementation, rework rte_eal_using_phys_addrs() so that it can
be called earlier but still avoid a circular dependency with
rte_mem_virt2phys().
In FreeBSD implementation, rte_eal_using_phys_addrs() always returns
false, so the detection part is left as is.

If librte_kni is compiled in and the KNI kmod is loaded,
- if the buses requested VA, force to PA if physical addresses are
  available as it was done before,
- else, keep iova as VA, KNI init will fail later.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 lib/librte_eal/common/eal_common_bus.c  |  4 ---
 lib/librte_eal/common/include/rte_bus.h |  2 +-
 lib/librte_eal/freebsd/eal/eal.c        | 10 +++++--
 lib/librte_eal/linux/eal/eal.c          | 38 +++++++++++++++++++++------
 lib/librte_eal/linux/eal/eal_memory.c   | 46 +++++++++------------------------
 5 files changed, 51 insertions(+), 49 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c
index c8f1901..77f1be1 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -237,10 +237,6 @@ enum rte_iova_mode
 			mode |= bus->get_iommu_class();
 	}
 
-	if (mode != RTE_IOVA_VA) {
-		/* Use default IOVA mode */
-		mode = RTE_IOVA_PA;
-	}
 	return mode;
 }
 
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 4faf2d2..90fe4e9 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -392,7 +392,7 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp,
 
 /**
  * Get the common iommu class of devices bound on to buses available in the
- * system. The default mode is PA.
+ * system. RTE_IOVA_DC means that no preferrence has been expressed.
  *
  * @return
  *     enum rte_iova_mode value.
diff --git a/lib/librte_eal/freebsd/eal/eal.c b/lib/librte_eal/freebsd/eal/eal.c
index 4eaa531..231f1dc 100644
--- a/lib/librte_eal/freebsd/eal/eal.c
+++ b/lib/librte_eal/freebsd/eal/eal.c
@@ -689,13 +689,19 @@ static void rte_eal_init_alert(const char *msg)
 	/* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
 	if (internal_config.iova_mode == RTE_IOVA_DC) {
 		/* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */
-		rte_eal_get_configuration()->iova_mode =
-			rte_bus_get_iommu_class();
+		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
+
+		if (iova_mode == RTE_IOVA_DC)
+			iova_mode = RTE_IOVA_PA;
+		rte_eal_get_configuration()->iova_mode = iova_mode;
 	} else {
 		rte_eal_get_configuration()->iova_mode =
 			internal_config.iova_mode;
 	}
 
+	RTE_LOG(INFO, EAL, "Selected IOVA mode '%s'\n",
+		rte_eal_iova_mode() == RTE_IOVA_PA ? "PA" : "VA");
+
 	if (internal_config.no_hugetlbfs == 0) {
 		/* rte_config isn't initialized yet */
 		ret = internal_config.process_type == RTE_PROC_PRIMARY ?
diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index 3e1d6eb..785ed2b 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -948,6 +948,7 @@ static void rte_eal_init_alert(const char *msg)
 	static char logid[PATH_MAX];
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 	char thread_name[RTE_MAX_THREAD_NAME_LEN];
+	bool phys_addrs;
 
 	/* checks if the machine is adequate */
 	if (!rte_cpu_is_supported()) {
@@ -1035,25 +1036,46 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
+	phys_addrs = rte_eal_using_phys_addrs() != 0;
+
 	/* if no EAL option "--iova-mode=<pa|va>", use bus IOVA scheme */
 	if (internal_config.iova_mode == RTE_IOVA_DC) {
-		/* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */
-		rte_eal_get_configuration()->iova_mode =
-			rte_bus_get_iommu_class();
+		/* autodetect the IOVA mapping mode */
+		enum rte_iova_mode iova_mode = rte_bus_get_iommu_class();
 
+		if (iova_mode == RTE_IOVA_DC) {
+			iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA;
+			RTE_LOG(DEBUG, EAL,
+				"Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n",
+				phys_addrs ? "PA" : "VA");
+		}
+#ifdef RTE_LIBRTE_KNI
 		/* Workaround for KNI which requires physical address to work */
-		if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA &&
+		if (iova_mode == RTE_IOVA_VA &&
 				rte_eal_check_module("rte_kni") == 1) {
-			rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
-			RTE_LOG(WARNING, EAL,
-				"Some devices want IOVA as VA but PA will be used because.. "
-				"KNI module inserted\n");
+			if (phys_addrs) {
+				iova_mode = RTE_IOVA_PA;
+				RTE_LOG(WARNING, EAL, "Forcing IOVA as 'PA' because KNI module is loaded\n");
+			} else {
+				RTE_LOG(DEBUG, EAL, "KNI can not work since physical addresses are unavailable\n");
+			}
 		}
+#endif
+		rte_eal_get_configuration()->iova_mode = iova_mode;
 	} else {
 		rte_eal_get_configuration()->iova_mode =
 			internal_config.iova_mode;
 	}
 
+	if (rte_eal_iova_mode() == RTE_IOVA_PA && !phys_addrs) {
+		rte_eal_init_alert("Cannot use IOVA as 'PA' since physical addresses are not available");
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	RTE_LOG(INFO, EAL, "Selected IOVA mode '%s'\n",
+		rte_eal_iova_mode() == RTE_IOVA_PA ? "PA" : "VA");
+
 	if (internal_config.no_hugetlbfs == 0) {
 		/* rte_config isn't initialized yet */
 		ret = internal_config.process_type == RTE_PROC_PRIMARY ?
diff --git a/lib/librte_eal/linux/eal/eal_memory.c b/lib/librte_eal/linux/eal/eal_memory.c
index 1853ace..25c4145 100644
--- a/lib/librte_eal/linux/eal/eal_memory.c
+++ b/lib/librte_eal/linux/eal/eal_memory.c
@@ -65,34 +65,10 @@
  * zone as well as a physical contiguous zone.
  */
 
-static bool phys_addrs_available = true;
+static int phys_addrs_available = -1;
 
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
 
-static void
-test_phys_addrs_available(void)
-{
-	uint64_t tmp = 0;
-	phys_addr_t physaddr;
-
-	if (!rte_eal_has_hugepages()) {
-		RTE_LOG(ERR, EAL,
-			"Started without hugepages support, physical addresses not available\n");
-		phys_addrs_available = false;
-		return;
-	}
-
-	physaddr = rte_mem_virt2phy(&tmp);
-	if (physaddr == RTE_BAD_PHYS_ADDR) {
-		if (rte_eal_iova_mode() == RTE_IOVA_PA)
-			RTE_LOG(ERR, EAL,
-				"Cannot obtain physical addresses: %s. "
-				"Only vfio will function.\n",
-				strerror(errno));
-		phys_addrs_available = false;
-	}
-}
-
 /*
  * Get physical address of any mapped virtual address in the current process.
  */
@@ -105,8 +81,7 @@
 	int page_size;
 	off_t offset;
 
-	/* Cannot parse /proc/self/pagemap, no need to log errors everywhere */
-	if (!phys_addrs_available)
+	if (phys_addrs_available == 0)
 		return RTE_BAD_IOVA;
 
 	/* standard page size */
@@ -1336,8 +1311,6 @@ void numa_error(char *where)
 	int nr_hugefiles, nr_hugepages = 0;
 	void *addr;
 
-	test_phys_addrs_available();
-
 	memset(used_hp, 0, sizeof(used_hp));
 
 	/* get pointer to global configuration */
@@ -1516,7 +1489,7 @@ void numa_error(char *where)
 				continue;
 		}
 
-		if (phys_addrs_available &&
+		if (rte_eal_using_phys_addrs() &&
 				rte_eal_iova_mode() != RTE_IOVA_VA) {
 			/* find physical addresses for each hugepage */
 			if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) {
@@ -1735,8 +1708,6 @@ void numa_error(char *where)
 	uint64_t memory[RTE_MAX_NUMA_NODES];
 	int hp_sz_idx, socket_id;
 
-	test_phys_addrs_available();
-
 	memset(used_hp, 0, sizeof(used_hp));
 
 	for (hp_sz_idx = 0;
@@ -1879,8 +1850,6 @@ void numa_error(char *where)
 				"into secondary processes\n");
 	}
 
-	test_phys_addrs_available();
-
 	fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY);
 	if (fd_hugepage < 0) {
 		RTE_LOG(ERR, EAL, "Could not open %s\n",
@@ -2020,6 +1989,15 @@ void numa_error(char *where)
 int
 rte_eal_using_phys_addrs(void)
 {
+	if (phys_addrs_available == -1) {
+		uint64_t tmp = 0;
+
+		if (rte_eal_has_hugepages() != 0 &&
+		    rte_mem_virt2phy(&tmp) != RTE_BAD_PHYS_ADDR)
+			phys_addrs_available = 1;
+		else
+			phys_addrs_available = 0;
+	}
 	return phys_addrs_available;
 }
 
-- 
1.8.3.1


  parent reply	other threads:[~2019-06-14  9:40 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30 17:48 [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 01/12] eal: Make rte_eal_using_phys_addrs work sooner Ben Walker
2019-05-30 21:29   ` [dpdk-dev] [PATCH v2 " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 04/12] eal/pci: Collapse two " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 05/12] eal/pci: Add function pci_ignore_device Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 07/12] eal/pci: Reverse if check " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 08/12] eal/pci: Collapse loops " Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 10/12] eal/pci: Finding a device bound to UIO does not force PA Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers Ben Walker
2019-05-30 21:29     ` [dpdk-dev] [PATCH v2 12/12] eal: If bus can't decide PA or VA, try to access PA Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 02/12] eal/pci: Inline several functions into rte_pci_get_iommu_class Ben Walker
2019-05-30 17:57   ` Stephen Hemminger
2019-05-30 18:09     ` Walker, Benjamin
2019-05-30 17:48 ` [dpdk-dev] [PATCH 03/12] eal/pci: Rework loops in rte_pci_get_iommu_class Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 04/12] eal/pci: Collapse two " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 05/12] eal/pci: Add function pci_ignore_device Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 06/12] eal/pci: Correctly test whitelist/blacklist in rte_pci_get_iommu_class Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 07/12] eal/pci: Reverse if check " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 08/12] eal/pci: Collapse loops " Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 09/12] eal/pci: Simplify rte_pci_get_iommu class by using a switch Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 10/12] eal/pci: Finding a device bound to UIO does not force PA Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 11/12] eal/pci: rte_pci_get_iommu_class handles no drivers Ben Walker
2019-05-30 17:48 ` [dpdk-dev] [PATCH 12/12] eal: If bus can't decide PA or VA, try to access PA Ben Walker
2019-06-03 10:48 ` [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode David Marchand
2019-06-03 16:44   ` Walker, Benjamin
2019-06-14  8:42     ` David Marchand
2019-06-14  9:39 ` [dpdk-dev] [PATCH v2 0/3] " David Marchand
2019-06-14  9:39   ` [dpdk-dev] [PATCH v2 1/3] kni: refuse to initialise when IOVA is not PA David Marchand
2019-06-14  9:39   ` David Marchand [this message]
2019-07-03 10:17     ` [dpdk-dev] [PATCH v2 2/3] eal: compute IOVA mode based on PA availability Burakov, Anatoly
2019-07-04  7:13       ` David Marchand
2019-06-14  9:39   ` [dpdk-dev] [PATCH v2 3/3] bus/pci: only consider usable devices to select IOVA mode David Marchand
2019-07-03 10:45     ` Burakov, Anatoly
2019-07-04  9:18       ` David Marchand
2019-07-04 10:43         ` Burakov, Anatoly
2019-07-04 10:47           ` David Marchand
2019-07-04 17:14     ` Stephen Hemminger
2019-07-05  7:58       ` David Marchand
2019-07-05 16:27         ` Stephen Hemminger
2019-07-05  8:26       ` Thomas Monjalon
2019-06-27 17:05   ` [dpdk-dev] [PATCH v2 0/3] Improve automatic selection of " Thomas Monjalon
2019-07-02 14:18     ` Thomas Monjalon
2019-07-05 14:57   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1560505157-9769-3-git-send-email-david.marchand@redhat.com \
    --to=david.marchand@redhat.com \
    --cc=anatoly.burakov@intel.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).