DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
@ 2018-07-04 12:53 Alejandro Lucero
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
                   ` (5 more replies)
  0 siblings, 6 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.

With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.

For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.

With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.

The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected.

This patchset applies on 17.11.3.

Similar changes will be submitted to main DPDK branch soon.

v2:
 - add get_addr_hint function
 - call munmap when hint given and not used by mmap
 - create dma mask in one step
 - refactor logs

v3:
 - add new API functions to map files

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-10  8:56   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device Alejandro Lucero
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

A device can suffer addressing limitations. This functions checks
memsegs have iovas within the supported range based on dma mask.

PMD should use this during initialization if supported devices
suffer addressing limitations, returning an error if this function
returns memsegs out of range.

Another potential usage is for emulated IOMMU hardware with addressing
limitations.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memory.c  | 33 ++++++++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_memory.h |  3 +++
 lib/librte_eal/rte_eal_version.map         |  1 +
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fc6c44d..f5efebe 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -109,6 +109,39 @@
 	}
 }
 
+/* check memseg iovas are within the required range based on dma mask */
+int
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+
+	const struct rte_mem_config *mcfg;
+	uint64_t mask;
+	int i;
+
+	/* create dma mask */
+	mask = ~((1ULL << maskbits) - 1);
+
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		if (mcfg->memseg[i].addr == NULL)
+			break;
+
+		if (mcfg->memseg[i].iova & mask) {
+			RTE_LOG(INFO, EAL,
+				"memseg[%d] iova %"PRIx64" out of range:\n",
+				i, mcfg->memseg[i].iova);
+
+			RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n",
+				mask);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
 /* return the number of memory channels */
 unsigned rte_memory_get_nchannel(void)
 {
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 80a8fc0..b2a0168 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -209,6 +209,9 @@ struct rte_memseg {
  */
 unsigned rte_memory_get_nrank(void);
 
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
 /**
  * Drivers based on uio will not load unless physical
  * addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f4f46c1..aa6cf87 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -184,6 +184,7 @@ DPDK_17.11 {
 
 	rte_eal_create_uio_dev;
 	rte_bus_get_iommu_class;
+	rte_eal_check_dma_mask;
 	rte_eal_has_pci;
 	rte_eal_iova_mode;
 	rte_eal_mbuf_default_mempool_ops;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-07 17:30   ` Andrew Rybchenko
  2018-07-10  8:57   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode Alejandro Lucero
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

A PMD should invoke this function for checking memsegs iovas are within
the supported range by the device.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_ether/rte_ethdev.h           | 13 +++++++++++++
 lib/librte_ether/rte_ethdev_version.map |  1 +
 2 files changed, 14 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index eba11ca..e51a432 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2799,6 +2799,19 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t port_id,
 int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
 
 /**
+ * check device dma mask within expected range based on dma mask.
+ *
+ * @param maskbits
+ *  mask length in bits
+ *
+ */
+static inline int
+rte_eth_dev_check_dma_mask(uint8_t maskbits)
+{
+	return rte_eal_check_dma_mask(maskbits);
+}
+
+/**
  *
  * Retrieve a burst of input packets from a receive queue of an Ethernet
  * device. The retrieved packets are stored in *rte_mbuf* structures whose
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index e9681ac..0b11b8a 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -191,6 +191,7 @@ DPDK_17.08 {
 DPDK_17.11 {
 	global:
 
+	rte_eth_dev_check_dma_mask;
 	rte_eth_dev_get_sec_ctx;
 	rte_eth_dev_pool_ops_supported;
 	rte_eth_dev_reset;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-10 10:14   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 4/6] mem: use address hint for mapping hugepages Alejandro Lucero
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

Although VT-d emulation currently only supports 39 bits, it could
be iovas being within that supported range. This patch allows
IOVA mode in such a case.

Indeed, memory initialization code can be modified for using lower
virtual addresses than those used by the kernel for 64 bits processes
by default, and therefore memsegs iovas can use 39 bits or less for
most system. And this is likely 100% true for VMs.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/bus/pci/linux/pci.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 74deef3..792c819 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -43,6 +43,7 @@
 #include <rte_devargs.h>
 #include <rte_memcpy.h>
 #include <rte_vfio.h>
+#include <rte_memory.h>
 
 #include "eal_private.h"
 #include "eal_filesystem.h"
@@ -613,10 +614,12 @@
 	fclose(fp);
 
 	mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 1;
-	if (mgaw < X86_VA_WIDTH)
+
+	if (!rte_eal_check_dma_mask(mgaw))
+		return true;
+	else
 		return false;
 
-	return true;
 }
 #elif defined(RTE_ARCH_PPC_64)
 static bool
@@ -640,13 +643,17 @@
 {
 	struct rte_pci_device *dev = NULL;
 	struct rte_pci_driver *drv = NULL;
+	int iommu_dma_mask_check_done = 0;
 
 	FOREACH_DRIVER_ON_PCIBUS(drv) {
 		FOREACH_DEVICE_ON_PCIBUS(dev) {
 			if (!rte_pci_match(drv, dev))
 				continue;
-			if (!pci_one_device_iommu_support_va(dev))
-				return false;
+			if (!iommu_dma_mask_check_done) {
+				if (pci_one_device_iommu_support_va(dev) < 0)
+					return false;
+				iommu_dma_mask_check_done  = 1;
+			}
 		}
 	}
 	return true;
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 4/6] mem: use address hint for mapping hugepages
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
                   ` (2 preceding siblings ...)
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-10 11:15   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask Alejandro Lucero
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 6/6] net/nfp: support IOVA VA mode Alejandro Lucero
  5 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exists addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.

This patch adds an address hint as starting address for 64 bits
systems.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 55 ++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 17c20d4..2ed4017 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -88,6 +88,23 @@
 
 static uint64_t baseaddr_offset;
 
+#ifdef RTE_ARCH_64
+/*
+ * Linux kernel uses a really high address as starting address for serving
+ * mmaps calls. If there exists addressing limitations and IOVA mode is VA,
+ * this starting address is likely too high for those devices. However, it
+ * is possible to use a lower address in the process virtual address space
+ * as with 64 bits there is a lot of available space.
+ *
+ * Current known limitations are 39 or 40 bits. Setting the starting address
+ * at 4GB implies there are 508GB or 1020GB for mapping the available
+ * hugepages. This is likely enough for most systems, although a device with
+ * addressing limitations should call rte_dev_check_dma_mask for ensuring all
+ * memory is within supported range.
+ */
+static uint64_t baseaddr = 0x100000000;
+#endif
+
 static bool phys_addrs_available = true;
 
 #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space"
@@ -250,6 +267,23 @@
 	}
 }
 
+static void *
+get_addr_hint(void)
+{
+	if (internal_config.base_virtaddr != 0) {
+		return (void *) (uintptr_t)
+			    (internal_config.base_virtaddr +
+			     baseaddr_offset);
+	} else {
+#ifdef RTE_ARCH_64
+		return (void *) (uintptr_t) (baseaddr +
+				baseaddr_offset);
+#else
+		return NULL;
+#endif
+	}
+}
+
 /*
  * Try to mmap *size bytes in /dev/zero. If it is successful, return the
  * pointer to the mmap'd area and keep *size unmodified. Else, retry
@@ -260,16 +294,10 @@
 static void *
 get_virtual_area(size_t *size, size_t hugepage_sz)
 {
-	void *addr;
+	void *addr, *addr_hint;
 	int fd;
 	long aligned_addr;
 
-	if (internal_config.base_virtaddr != 0) {
-		addr = (void*) (uintptr_t) (internal_config.base_virtaddr +
-				baseaddr_offset);
-	}
-	else addr = NULL;
-
 	RTE_LOG(DEBUG, EAL, "Ask a virtual area of 0x%zx bytes\n", *size);
 
 	fd = open("/dev/zero", O_RDONLY);
@@ -278,7 +306,9 @@
 		return NULL;
 	}
 	do {
-		addr = mmap(addr,
+		addr_hint = get_addr_hint();
+
+		addr = mmap(addr_hint,
 				(*size) + hugepage_sz, PROT_READ,
 #ifdef RTE_ARCH_PPC_64
 				MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
@@ -286,8 +316,15 @@
 				MAP_PRIVATE,
 #endif
 				fd, 0);
-		if (addr == MAP_FAILED)
+		if (addr == MAP_FAILED) {
+			/* map failed. Let's try with less memory */
 			*size -= hugepage_sz;
+		} else if (addr_hint && addr != addr_hint) {
+			/* hint was not used. Try with another offset */
+			munmap(addr, (*size) + hugepage_sz);
+			addr = MAP_FAILED;
+			baseaddr_offset += 0x100000000;
+		}
 	} while (addr == MAP_FAILED && *size > 0);
 
 	if (addr == MAP_FAILED) {
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
                   ` (3 preceding siblings ...)
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 4/6] mem: use address hint for mapping hugepages Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-10 10:17   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 6/6] net/nfp: support IOVA VA mode Alejandro Lucero
  5 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

NFP devices can not handle DMA addresses requiring more than
40 bits. This patch uses rte_dev_check_dma_mask with 40 bits
and avoids device initialization if memory out of NFP range.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/nfp_net.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index d9cd047..5976f37 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2649,6 +2649,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 	pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 
+	/* NFP can not handle DMA addresses requiring more than 40 bits */
+	if (rte_eth_dev_check_dma_mask(40) < 0) {
+		RTE_LOG(INFO, PMD, "device %s can not be used:",
+				   pci_dev->device.name);
+		RTE_LOG(INFO, PMD, "\trestricted dma mask to 40 bits!\n");
+		return -ENODEV;
+	};
+
 	if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
 	    (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
 		port = get_pf_port_number(eth_dev->data->name);
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 6/6] net/nfp: support IOVA VA mode
  2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
                   ` (4 preceding siblings ...)
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask Alejandro Lucero
@ 2018-07-04 12:53 ` Alejandro Lucero
  2018-07-10 10:18   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  5 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-04 12:53 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

NFP can handle IOVA as VA. It requires to check those IOVAs
being in the supported range what is done during initialization.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/nfp_net.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 5976f37..354dec3 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -3053,14 +3053,16 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev)
 
 static struct rte_pci_driver rte_nfp_net_pf_pmd = {
 	.id_table = pci_id_nfp_pf_net_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = nfp_pf_pci_probe,
 	.remove = eth_nfp_pci_remove,
 };
 
 static struct rte_pci_driver rte_nfp_net_vf_pmd = {
 	.id_table = pci_id_nfp_vf_net_map,
-	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
+		     RTE_PCI_DRV_IOVA_AS_VA,
 	.probe = eth_nfp_pci_probe,
 	.remove = eth_nfp_pci_remove,
 };
-- 
1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device Alejandro Lucero
@ 2018-07-07 17:30   ` Andrew Rybchenko
  2018-07-10  8:57   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
  1 sibling, 0 replies; 62+ messages in thread
From: Andrew Rybchenko @ 2018-07-07 17:30 UTC (permalink / raw)
  To: Alejandro Lucero, dev
  Cc: stable, anatoly.burakov, maxime.coquelin, ferruh.yigit

On 04.07.2018 15:53, Alejandro Lucero wrote:
> A PMD should invoke this function for checking memsegs iovas are within
> the supported range by the device.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
>   lib/librte_ether/rte_ethdev.h           | 13 +++++++++++++
>   lib/librte_ether/rte_ethdev_version.map |  1 +
>   2 files changed, 14 insertions(+)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index eba11ca..e51a432 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -2799,6 +2799,19 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t port_id,
>   int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
>   
>   /**
> + * check device dma mask within expected range based on dma mask.
> + *
> + * @param maskbits
> + *  mask length in bits
> + *
> + */
> +static inline int
> +rte_eth_dev_check_dma_mask(uint8_t maskbits)
> +{
> +	return rte_eal_check_dma_mask(maskbits);

I'm afraid I don't understand why do we need the wrapper.
May PMD use EAL function directly?

> +}
> +
> +/**
>    *
>    * Retrieve a burst of input packets from a receive queue of an Ethernet
>    * device. The retrieved packets are stored in *rte_mbuf* structures whose
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index e9681ac..0b11b8a 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -191,6 +191,7 @@ DPDK_17.08 {
>   DPDK_17.11 {
>   	global:
>   
> +	rte_eth_dev_check_dma_mask;
>   	rte_eth_dev_get_sec_ctx;
>   	rte_eth_dev_pool_ops_supported;
>   	rte_eth_dev_reset;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
@ 2018-07-10  8:56   ` Eelco Chaudron
  2018-07-10  9:34     ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10  8:56 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> A device can suffer addressing limitations. This functions checks
> memsegs have iovas within the supported range based on dma mask.
>
> PMD should use this during initialization if supported devices
> suffer addressing limitations, returning an error if this function
> returns memsegs out of range.
>
> Another potential usage is for emulated IOMMU hardware with addressing
> limitations.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/eal_common_memory.c  | 33 
> ++++++++++++++++++++++++++++++
>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>  lib/librte_eal/rte_eal_version.map         |  1 +
>  3 files changed, 37 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c 
> b/lib/librte_eal/common/eal_common_memory.c
> index fc6c44d..f5efebe 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -109,6 +109,39 @@
>  	}
>  }
>
> +/* check memseg iovas are within the required range based on dma mask 
> */
> +int
> +rte_eal_check_dma_mask(uint8_t maskbits)
> +{
> +
> +	const struct rte_mem_config *mcfg;
> +	uint64_t mask;
> +	int i;
> +

I think we should add some sanity check to the input maskbits, i.e. 
[64,0) or [64, 32]? What would be a reasonable lower bound.

> +	/* create dma mask */
> +	mask = ~((1ULL << maskbits) - 1);
> +
> +	/* get pointer to global configuration */
> +	mcfg = rte_eal_get_configuration()->mem_config;
> +
> +	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> +		if (mcfg->memseg[i].addr == NULL)
> +			break;
> +
> +		if (mcfg->memseg[i].iova & mask) {
> +			RTE_LOG(INFO, EAL,
> +				"memseg[%d] iova %"PRIx64" out of range:\n",
> +				i, mcfg->memseg[i].iova);
> +
> +			RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n",
> +				mask);
> +			return -1;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  /* return the number of memory channels */
>  unsigned rte_memory_get_nchannel(void)
>  {
> diff --git a/lib/librte_eal/common/include/rte_memory.h 
> b/lib/librte_eal/common/include/rte_memory.h
> index 80a8fc0..b2a0168 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -209,6 +209,9 @@ struct rte_memseg {
>   */
>  unsigned rte_memory_get_nrank(void);
>
> +/* check memsegs iovas are within a range based on dma mask */
> +int rte_eal_check_dma_mask(uint8_t maskbits);
> +
>  /**
>   * Drivers based on uio will not load unless physical
>   * addresses are obtainable. It is only possible to get
> diff --git a/lib/librte_eal/rte_eal_version.map 
> b/lib/librte_eal/rte_eal_version.map
> index f4f46c1..aa6cf87 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -184,6 +184,7 @@ DPDK_17.11 {
>
>  	rte_eal_create_uio_dev;
>  	rte_bus_get_iommu_class;
> +	rte_eal_check_dma_mask;
>  	rte_eal_has_pci;
>  	rte_eal_iova_mode;
>  	rte_eal_mbuf_default_mempool_ops;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device Alejandro Lucero
  2018-07-07 17:30   ` Andrew Rybchenko
@ 2018-07-10  8:57   ` Eelco Chaudron
  2018-07-10  9:42     ` Alejandro Lucero
  1 sibling, 1 reply; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10  8:57 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit,
	Andrew Rybchenko



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> A PMD should invoke this function for checking memsegs iovas are 
> within
> the supported range by the device.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

Agree with Andrew here, why not call rte_eal_check_dma_mask() directly 
in nfp_net_txq_full()?

> ---
>  lib/librte_ether/rte_ethdev.h           | 13 +++++++++++++
>  lib/librte_ether/rte_ethdev_version.map |  1 +
>  2 files changed, 14 insertions(+)
>
> diff --git a/lib/librte_ether/rte_ethdev.h 
> b/lib/librte_ether/rte_ethdev.h
> index eba11ca..e51a432 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -2799,6 +2799,19 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t 
> port_id,
>  int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int 
> on);
>
>  /**
> + * check device dma mask within expected range based on dma mask.
> + *
> + * @param maskbits
> + *  mask length in bits
> + *
> + */
> +static inline int
> +rte_eth_dev_check_dma_mask(uint8_t maskbits)
> +{
> +	return rte_eal_check_dma_mask(maskbits);
> +}
> +
> +/**
>   *
>   * Retrieve a burst of input packets from a receive queue of an 
> Ethernet
>   * device. The retrieved packets are stored in *rte_mbuf* structures 
> whose
> diff --git a/lib/librte_ether/rte_ethdev_version.map 
> b/lib/librte_ether/rte_ethdev_version.map
> index e9681ac..0b11b8a 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -191,6 +191,7 @@ DPDK_17.08 {
>  DPDK_17.11 {
>  	global:
>
> +	rte_eth_dev_check_dma_mask;
>  	rte_eth_dev_get_sec_ctx;
>  	rte_eth_dev_pool_ops_supported;
>  	rte_eth_dev_reset;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10  8:56   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
@ 2018-07-10  9:34     ` Alejandro Lucero
  2018-07-10 10:06       ` Eelco Chaudron
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10  9:34 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit

On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com> wrote:

>
>
> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>
> A device can suffer addressing limitations. This functions checks
>> memsegs have iovas within the supported range based on dma mask.
>>
>> PMD should use this during initialization if supported devices
>> suffer addressing limitations, returning an error if this function
>> returns memsegs out of range.
>>
>> Another potential usage is for emulated IOMMU hardware with addressing
>> limitations.
>>
>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>  lib/librte_eal/common/eal_common_memory.c  | 33
>> ++++++++++++++++++++++++++++++
>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>  3 files changed, 37 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>> b/lib/librte_eal/common/eal_common_memory.c
>> index fc6c44d..f5efebe 100644
>> --- a/lib/librte_eal/common/eal_common_memory.c
>> +++ b/lib/librte_eal/common/eal_common_memory.c
>> @@ -109,6 +109,39 @@
>>         }
>>  }
>>
>> +/* check memseg iovas are within the required range based on dma mask */
>> +int
>> +rte_eal_check_dma_mask(uint8_t maskbits)
>> +{
>> +
>> +       const struct rte_mem_config *mcfg;
>> +       uint64_t mask;
>> +       int i;
>> +
>>
>
> I think we should add some sanity check to the input maskbits, i.e. [64,0)
> or [64, 32]? What would be a reasonable lower bound.
>
>
This is not a user's API, so any invocation will be reviewed, but I guess
adding a sanity check here does not harm.

Not sure about lower bound but upper should 64, although it does not make
sense but it is safe. Lower bound is not so problematic.


>
> +       /* create dma mask */
>> +       mask = ~((1ULL << maskbits) - 1);
>> +
>> +       /* get pointer to global configuration */
>> +       mcfg = rte_eal_get_configuration()->mem_config;
>> +
>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>> +               if (mcfg->memseg[i].addr == NULL)
>> +                       break;
>> +
>> +               if (mcfg->memseg[i].iova & mask) {
>> +                       RTE_LOG(INFO, EAL,
>> +                               "memseg[%d] iova %"PRIx64" out of
>> range:\n",
>> +                               i, mcfg->memseg[i].iova);
>> +
>> +                       RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n",
>> +                               mask);
>> +                       return -1;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>  /* return the number of memory channels */
>>  unsigned rte_memory_get_nchannel(void)
>>  {
>> diff --git a/lib/librte_eal/common/include/rte_memory.h
>> b/lib/librte_eal/common/include/rte_memory.h
>> index 80a8fc0..b2a0168 100644
>> --- a/lib/librte_eal/common/include/rte_memory.h
>> +++ b/lib/librte_eal/common/include/rte_memory.h
>> @@ -209,6 +209,9 @@ struct rte_memseg {
>>   */
>>  unsigned rte_memory_get_nrank(void);
>>
>> +/* check memsegs iovas are within a range based on dma mask */
>> +int rte_eal_check_dma_mask(uint8_t maskbits);
>> +
>>  /**
>>   * Drivers based on uio will not load unless physical
>>   * addresses are obtainable. It is only possible to get
>> diff --git a/lib/librte_eal/rte_eal_version.map
>> b/lib/librte_eal/rte_eal_version.map
>> index f4f46c1..aa6cf87 100644
>> --- a/lib/librte_eal/rte_eal_version.map
>> +++ b/lib/librte_eal/rte_eal_version.map
>> @@ -184,6 +184,7 @@ DPDK_17.11 {
>>
>>         rte_eal_create_uio_dev;
>>         rte_bus_get_iommu_class;
>> +       rte_eal_check_dma_mask;
>>         rte_eal_has_pci;
>>         rte_eal_iova_mode;
>>         rte_eal_mbuf_default_mempool_ops;
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device
  2018-07-10  8:57   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
@ 2018-07-10  9:42     ` Alejandro Lucero
  2018-07-10  9:44       ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10  9:42 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit,
	Andrew Rybchenko

On Tue, Jul 10, 2018 at 9:57 AM, Eelco Chaudron <echaudro@redhat.com> wrote:

>
>
> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>
> A PMD should invoke this function for checking memsegs iovas are within
>> the supported range by the device.
>>
>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>
>
> Agree with Andrew here, why not call rte_eal_check_dma_mask() directly in
> nfp_net_txq_full()?
>
>
My idea was to add this indirection for handling dma mask when just part of
the IOVAs are not usable. Now, if the dma mask finds a problem, the PMD
does not make any port initialization.

Memory management is changing and ideally an app should just allocate
memory safe to be used by the PMD when that memory is going to be used for
sending or receiving data, what is not always the case.

It is true this indirection is not being used for any purpose by now, so
yes, I could use a direct call the the EAL one.


>
> ---
>>  lib/librte_ether/rte_ethdev.h           | 13 +++++++++++++
>>  lib/librte_ether/rte_ethdev_version.map |  1 +
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.
>> h
>> index eba11ca..e51a432 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -2799,6 +2799,19 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t
>> port_id,
>>  int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
>>
>>  /**
>> + * check device dma mask within expected range based on dma mask.
>> + *
>> + * @param maskbits
>> + *  mask length in bits
>> + *
>> + */
>> +static inline int
>> +rte_eth_dev_check_dma_mask(uint8_t maskbits)
>> +{
>> +       return rte_eal_check_dma_mask(maskbits);
>> +}
>> +
>> +/**
>>   *
>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>   * device. The retrieved packets are stored in *rte_mbuf* structures
>> whose
>> diff --git a/lib/librte_ether/rte_ethdev_version.map
>> b/lib/librte_ether/rte_ethdev_version.map
>> index e9681ac..0b11b8a 100644
>> --- a/lib/librte_ether/rte_ethdev_version.map
>> +++ b/lib/librte_ether/rte_ethdev_version.map
>> @@ -191,6 +191,7 @@ DPDK_17.08 {
>>  DPDK_17.11 {
>>         global:
>>
>> +       rte_eth_dev_check_dma_mask;
>>         rte_eth_dev_get_sec_ctx;
>>         rte_eth_dev_pool_ops_supported;
>>         rte_eth_dev_reset;
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device
  2018-07-10  9:42     ` Alejandro Lucero
@ 2018-07-10  9:44       ` Alejandro Lucero
  0 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10  9:44 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit,
	Andrew Rybchenko

On Tue, Jul 10, 2018 at 10:42 AM, Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:

>
>
> On Tue, Jul 10, 2018 at 9:57 AM, Eelco Chaudron <echaudro@redhat.com>
> wrote:
>
>>
>>
>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>
>> A PMD should invoke this function for checking memsegs iovas are within
>>> the supported range by the device.
>>>
>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>
>>
>> Agree with Andrew here, why not call rte_eal_check_dma_mask() directly in
>> nfp_net_txq_full()?
>>
>>
BTW, IOVA checking can not be done inside PMD functions like that one
because that is in the fast path. Any IOVA checking needs to be done at
initialization time.


>
> My idea was to add this indirection for handling dma mask when just part
> of the IOVAs are not usable. Now, if the dma mask finds a problem, the PMD
> does not make any port initialization.
>
> Memory management is changing and ideally an app should just allocate
> memory safe to be used by the PMD when that memory is going to be used for
> sending or receiving data, what is not always the case.
>
> It is true this indirection is not being used for any purpose by now, so
> yes, I could use a direct call the the EAL one.
>
>
>>
>> ---
>>>  lib/librte_ether/rte_ethdev.h           | 13 +++++++++++++
>>>  lib/librte_ether/rte_ethdev_version.map |  1 +
>>>  2 files changed, 14 insertions(+)
>>>
>>> diff --git a/lib/librte_ether/rte_ethdev.h
>>> b/lib/librte_ether/rte_ethdev.h
>>> index eba11ca..e51a432 100644
>>> --- a/lib/librte_ether/rte_ethdev.h
>>> +++ b/lib/librte_ether/rte_ethdev.h
>>> @@ -2799,6 +2799,19 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t
>>> port_id,
>>>  int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
>>>
>>>  /**
>>> + * check device dma mask within expected range based on dma mask.
>>> + *
>>> + * @param maskbits
>>> + *  mask length in bits
>>> + *
>>> + */
>>> +static inline int
>>> +rte_eth_dev_check_dma_mask(uint8_t maskbits)
>>> +{
>>> +       return rte_eal_check_dma_mask(maskbits);
>>> +}
>>> +
>>> +/**
>>>   *
>>>   * Retrieve a burst of input packets from a receive queue of an Ethernet
>>>   * device. The retrieved packets are stored in *rte_mbuf* structures
>>> whose
>>> diff --git a/lib/librte_ether/rte_ethdev_version.map
>>> b/lib/librte_ether/rte_ethdev_version.map
>>> index e9681ac..0b11b8a 100644
>>> --- a/lib/librte_ether/rte_ethdev_version.map
>>> +++ b/lib/librte_ether/rte_ethdev_version.map
>>> @@ -191,6 +191,7 @@ DPDK_17.08 {
>>>  DPDK_17.11 {
>>>         global:
>>>
>>> +       rte_eth_dev_check_dma_mask;
>>>         rte_eth_dev_get_sec_ctx;
>>>         rte_eth_dev_pool_ops_supported;
>>>         rte_eth_dev_reset;
>>> --
>>> 1.9.1
>>>
>>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10  9:34     ` Alejandro Lucero
@ 2018-07-10 10:06       ` Eelco Chaudron
  2018-07-10 10:52         ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 10:06 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit



On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:

> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com> 
> wrote:
>
>>
>>
>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>
>> A device can suffer addressing limitations. This functions checks
>>> memsegs have iovas within the supported range based on dma mask.
>>>
>>> PMD should use this during initialization if supported devices
>>> suffer addressing limitations, returning an error if this function
>>> returns memsegs out of range.
>>>
>>> Another potential usage is for emulated IOMMU hardware with 
>>> addressing
>>> limitations.
>>>
>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>> ++++++++++++++++++++++++++++++
>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>  3 files changed, 37 insertions(+)
>>>
>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>> b/lib/librte_eal/common/eal_common_memory.c
>>> index fc6c44d..f5efebe 100644
>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>> @@ -109,6 +109,39 @@
>>>         }
>>>  }
>>>
>>> +/* check memseg iovas are within the required range based on dma 
>>> mask */
>>> +int
>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>> +{
>>> +
>>> +       const struct rte_mem_config *mcfg;
>>> +       uint64_t mask;
>>> +       int i;
>>> +
>>>
>>
>> I think we should add some sanity check to the input maskbits, i.e. 
>> [64,0)
>> or [64, 32]? What would be a reasonable lower bound.
>>
>>
> This is not a user's API, so any invocation will be reviewed, but I 
> guess
> adding a sanity check here does not harm.
>
> Not sure about lower bound but upper should 64, although it does not 
> make
> sense but it is safe. Lower bound is not so problematic.
>
>
>>
>> +       /* create dma mask */
>>> +       mask = ~((1ULL << maskbits) - 1);
>>> +
>>> +       /* get pointer to global configuration */
>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>> +
>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>> +               if (mcfg->memseg[i].addr == NULL)
>>> +                       break;

Looking at some other code, it looks like NULL entries might exists. So 
should a continue; rather than a break; be used here?

>>> +
>>> +               if (mcfg->memseg[i].iova & mask) {
>>> +                       RTE_LOG(INFO, EAL,
>>> +                               "memseg[%d] iova %"PRIx64" out of
>>> range:\n",
>>> +                               i, mcfg->memseg[i].iova);
>>> +
>>> +                       RTE_LOG(INFO, EAL, "\tusing dma mask 
>>> %"PRIx64"\n",
>>> +                               mask);
>>> +                       return -1;
>>> +               }
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>  /* return the number of memory channels */
>>>  unsigned rte_memory_get_nchannel(void)
>>>  {
>>> diff --git a/lib/librte_eal/common/include/rte_memory.h
>>> b/lib/librte_eal/common/include/rte_memory.h
>>> index 80a8fc0..b2a0168 100644
>>> --- a/lib/librte_eal/common/include/rte_memory.h
>>> +++ b/lib/librte_eal/common/include/rte_memory.h
>>> @@ -209,6 +209,9 @@ struct rte_memseg {
>>>   */
>>>  unsigned rte_memory_get_nrank(void);
>>>
>>> +/* check memsegs iovas are within a range based on dma mask */
>>> +int rte_eal_check_dma_mask(uint8_t maskbits);
>>> +
>>>  /**
>>>   * Drivers based on uio will not load unless physical
>>>   * addresses are obtainable. It is only possible to get
>>> diff --git a/lib/librte_eal/rte_eal_version.map
>>> b/lib/librte_eal/rte_eal_version.map
>>> index f4f46c1..aa6cf87 100644
>>> --- a/lib/librte_eal/rte_eal_version.map
>>> +++ b/lib/librte_eal/rte_eal_version.map
>>> @@ -184,6 +184,7 @@ DPDK_17.11 {
>>>
>>>         rte_eal_create_uio_dev;
>>>         rte_bus_get_iommu_class;
>>> +       rte_eal_check_dma_mask;
>>>         rte_eal_has_pci;
>>>         rte_eal_iova_mode;
>>>         rte_eal_mbuf_default_mempool_ops;
>>> --
>>> 1.9.1
>>>
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode Alejandro Lucero
@ 2018-07-10 10:14   ` Eelco Chaudron
  2018-07-10 15:37     ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 10:14 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> Although VT-d emulation currently only supports 39 bits, it could
> be iovas being within that supported range. This patch allows
> IOVA mode in such a case.
>
> Indeed, memory initialization code can be modified for using lower
> virtual addresses than those used by the kernel for 64 bits processes
> by default, and therefore memsegs iovas can use 39 bits or less for
> most system. And this is likely 100% true for VMs.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
>  drivers/bus/pci/linux/pci.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
> index 74deef3..792c819 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -43,6 +43,7 @@
>  #include <rte_devargs.h>
>  #include <rte_memcpy.h>
>  #include <rte_vfio.h>
> +#include <rte_memory.h>
>
>  #include "eal_private.h"
>  #include "eal_filesystem.h"
> @@ -613,10 +614,12 @@
>  	fclose(fp);
>
>  	mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT) + 
> 1;
> -	if (mgaw < X86_VA_WIDTH)
> +
> +	if (!rte_eal_check_dma_mask(mgaw))

If think in this case we still need to check the X86_VA_WIDTH, i.e.
if (mgaw < X86_VA_WIDTH && !rte_eal_check_dma_mask(mgaw))


> +		return true;
> +	else
>  		return false;
>
> -	return true;
>  }
>  #elif defined(RTE_ARCH_PPC_64)
>  static bool
> @@ -640,13 +643,17 @@
>  {
>  	struct rte_pci_device *dev = NULL;
>  	struct rte_pci_driver *drv = NULL;
> +	int iommu_dma_mask_check_done = 0;
>
>  	FOREACH_DRIVER_ON_PCIBUS(drv) {
>  		FOREACH_DEVICE_ON_PCIBUS(dev) {
>  			if (!rte_pci_match(drv, dev))
>  				continue;
> -			if (!pci_one_device_iommu_support_va(dev))
> -				return false;
> +			if (!iommu_dma_mask_check_done) {
> +				if (pci_one_device_iommu_support_va(dev) < 0)
> +					return false;
> +				iommu_dma_mask_check_done  = 1;

Not sure why this change? Why do we only need to check one device on the 
bus?

In addition, if this is what was intended, rather than a variable you 
can return true in this case, or did you intended to clear the 
iommu_dma_mask_check_done on every PCI BUS iteration?

> +			}
>  		}
>  	}
>  	return true;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask Alejandro Lucero
@ 2018-07-10 10:17   ` Eelco Chaudron
  0 siblings, 0 replies; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 10:17 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> NFP devices can not handle DMA addresses requiring more than
> 40 bits. This patch uses rte_dev_check_dma_mask with 40 bits
> and avoids device initialization if memory out of NFP range.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

Changes look good to me!

Acked-by: Eelco Chaudron <echaudro@redhat.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 6/6] net/nfp: support IOVA VA mode
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 6/6] net/nfp: support IOVA VA mode Alejandro Lucero
@ 2018-07-10 10:18   ` Eelco Chaudron
  0 siblings, 0 replies; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 10:18 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> NFP can handle IOVA as VA. It requires to check those IOVAs
> being in the supported range what is done during initialization.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>

Changes look good to me!

Acked-by: Eelco Chaudron <echaudro@redhat.com>

<SNIP>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 10:06       ` Eelco Chaudron
@ 2018-07-10 10:52         ` Alejandro Lucero
  2018-07-10 11:14           ` Eelco Chaudron
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10 10:52 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit

On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron <echaudro@redhat.com>
wrote:

>
>
> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>
> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com>
>> wrote:
>>
>>
>>>
>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>
>>> A device can suffer addressing limitations. This functions checks
>>>
>>>> memsegs have iovas within the supported range based on dma mask.
>>>>
>>>> PMD should use this during initialization if supported devices
>>>> suffer addressing limitations, returning an error if this function
>>>> returns memsegs out of range.
>>>>
>>>> Another potential usage is for emulated IOMMU hardware with addressing
>>>> limitations.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> ---
>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>> ++++++++++++++++++++++++++++++
>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>  3 files changed, 37 insertions(+)
>>>>
>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>> index fc6c44d..f5efebe 100644
>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>> @@ -109,6 +109,39 @@
>>>>         }
>>>>  }
>>>>
>>>> +/* check memseg iovas are within the required range based on dma mask
>>>> */
>>>> +int
>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>> +{
>>>> +
>>>> +       const struct rte_mem_config *mcfg;
>>>> +       uint64_t mask;
>>>> +       int i;
>>>> +
>>>>
>>>>
>>> I think we should add some sanity check to the input maskbits, i.e.
>>> [64,0)
>>> or [64, 32]? What would be a reasonable lower bound.
>>>
>>>
>>> This is not a user's API, so any invocation will be reviewed, but I guess
>> adding a sanity check here does not harm.
>>
>> Not sure about lower bound but upper should 64, although it does not make
>> sense but it is safe. Lower bound is not so problematic.
>>
>>
>>
>>> +       /* create dma mask */
>>>
>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>> +
>>>> +       /* get pointer to global configuration */
>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>> +
>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>> +                       break;
>>>>
>>>
> Looking at some other code, it looks like NULL entries might exists. So
> should a continue; rather than a break; be used here?
>
>
I do not think so. memsegs are allocated sequentially, so first with addr
as NULL implies no more memsegs.


>
> +
>>>> +               if (mcfg->memseg[i].iova & mask) {
>>>> +                       RTE_LOG(INFO, EAL,
>>>> +                               "memseg[%d] iova %"PRIx64" out of
>>>> range:\n",
>>>> +                               i, mcfg->memseg[i].iova);
>>>> +
>>>> +                       RTE_LOG(INFO, EAL, "\tusing dma mask
>>>> %"PRIx64"\n",
>>>> +                               mask);
>>>> +                       return -1;
>>>> +               }
>>>> +       }
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>  /* return the number of memory channels */
>>>>  unsigned rte_memory_get_nchannel(void)
>>>>  {
>>>> diff --git a/lib/librte_eal/common/include/rte_memory.h
>>>> b/lib/librte_eal/common/include/rte_memory.h
>>>> index 80a8fc0..b2a0168 100644
>>>> --- a/lib/librte_eal/common/include/rte_memory.h
>>>> +++ b/lib/librte_eal/common/include/rte_memory.h
>>>> @@ -209,6 +209,9 @@ struct rte_memseg {
>>>>   */
>>>>  unsigned rte_memory_get_nrank(void);
>>>>
>>>> +/* check memsegs iovas are within a range based on dma mask */
>>>> +int rte_eal_check_dma_mask(uint8_t maskbits);
>>>> +
>>>>  /**
>>>>   * Drivers based on uio will not load unless physical
>>>>   * addresses are obtainable. It is only possible to get
>>>> diff --git a/lib/librte_eal/rte_eal_version.map
>>>> b/lib/librte_eal/rte_eal_version.map
>>>> index f4f46c1..aa6cf87 100644
>>>> --- a/lib/librte_eal/rte_eal_version.map
>>>> +++ b/lib/librte_eal/rte_eal_version.map
>>>> @@ -184,6 +184,7 @@ DPDK_17.11 {
>>>>
>>>>         rte_eal_create_uio_dev;
>>>>         rte_bus_get_iommu_class;
>>>> +       rte_eal_check_dma_mask;
>>>>         rte_eal_has_pci;
>>>>         rte_eal_iova_mode;
>>>>         rte_eal_mbuf_default_mempool_ops;
>>>> --
>>>> 1.9.1
>>>>
>>>>
>>>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 10:52         ` Alejandro Lucero
@ 2018-07-10 11:14           ` Eelco Chaudron
  2018-07-10 11:33             ` Burakov, Anatoly
  2018-07-10 11:40             ` Alejandro Lucero
  0 siblings, 2 replies; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 11:14 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit



On 10 Jul 2018, at 12:52, Alejandro Lucero wrote:

> On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron <echaudro@redhat.com>
> wrote:
>
>>
>>
>> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>>
>> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com>
>>> wrote:
>>>
>>>
>>>>
>>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>>
>>>> A device can suffer addressing limitations. This functions checks
>>>>
>>>>> memsegs have iovas within the supported range based on dma mask.
>>>>>
>>>>> PMD should use this during initialization if supported devices
>>>>> suffer addressing limitations, returning an error if this function
>>>>> returns memsegs out of range.
>>>>>
>>>>> Another potential usage is for emulated IOMMU hardware with 
>>>>> addressing
>>>>> limitations.
>>>>>
>>>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> ---
>>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>>> ++++++++++++++++++++++++++++++
>>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>>  3 files changed, 37 insertions(+)
>>>>>
>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>>> index fc6c44d..f5efebe 100644
>>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>>> @@ -109,6 +109,39 @@
>>>>>         }
>>>>>  }
>>>>>
>>>>> +/* check memseg iovas are within the required range based on dma 
>>>>> mask
>>>>> */
>>>>> +int
>>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>>> +{
>>>>> +
>>>>> +       const struct rte_mem_config *mcfg;
>>>>> +       uint64_t mask;
>>>>> +       int i;
>>>>> +
>>>>>
>>>>>
>>>> I think we should add some sanity check to the input maskbits, i.e.
>>>> [64,0)
>>>> or [64, 32]? What would be a reasonable lower bound.
>>>>
>>>>
>>>> This is not a user's API, so any invocation will be reviewed, but I 
>>>> guess
>>> adding a sanity check here does not harm.
>>>
>>> Not sure about lower bound but upper should 64, although it does not 
>>> make
>>> sense but it is safe. Lower bound is not so problematic.
>>>
>>>
>>>
>>>> +       /* create dma mask */
>>>>
>>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>>> +
>>>>> +       /* get pointer to global configuration */
>>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>>> +
>>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>>> +                       break;
>>>>>
>>>>
>> Looking at some other code, it looks like NULL entries might exists. 
>> So
>> should a continue; rather than a break; be used here?
>>
>>
> I do not think so. memsegs are allocated sequentially, so first with 
> addr
> as NULL implies no more memsegs.

I was referring to the mem walk functions, rte_memseg_list_walk(). Maybe 
some having more experience with this area can review/comment.

>
>
>>
>> +
>>>>> +               if (mcfg->memseg[i].iova & mask) {
>>>>> +                       RTE_LOG(INFO, EAL,
>>>>> +                               "memseg[%d] iova %"PRIx64" out of
>>>>> range:\n",
>>>>> +                               i, mcfg->memseg[i].iova);
>>>>> +
>>>>> +                       RTE_LOG(INFO, EAL, "\tusing dma mask
>>>>> %"PRIx64"\n",
>>>>> +                               mask);
>>>>> +                       return -1;
>>>>> +               }
>>>>> +       }
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>>  /* return the number of memory channels */
>>>>>  unsigned rte_memory_get_nchannel(void)
>>>>>  {
>>>>> diff --git a/lib/librte_eal/common/include/rte_memory.h
>>>>> b/lib/librte_eal/common/include/rte_memory.h
>>>>> index 80a8fc0..b2a0168 100644
>>>>> --- a/lib/librte_eal/common/include/rte_memory.h
>>>>> +++ b/lib/librte_eal/common/include/rte_memory.h
>>>>> @@ -209,6 +209,9 @@ struct rte_memseg {
>>>>>   */
>>>>>  unsigned rte_memory_get_nrank(void);
>>>>>
>>>>> +/* check memsegs iovas are within a range based on dma mask */
>>>>> +int rte_eal_check_dma_mask(uint8_t maskbits);
>>>>> +
>>>>>  /**
>>>>>   * Drivers based on uio will not load unless physical
>>>>>   * addresses are obtainable. It is only possible to get
>>>>> diff --git a/lib/librte_eal/rte_eal_version.map
>>>>> b/lib/librte_eal/rte_eal_version.map
>>>>> index f4f46c1..aa6cf87 100644
>>>>> --- a/lib/librte_eal/rte_eal_version.map
>>>>> +++ b/lib/librte_eal/rte_eal_version.map
>>>>> @@ -184,6 +184,7 @@ DPDK_17.11 {
>>>>>
>>>>>         rte_eal_create_uio_dev;
>>>>>         rte_bus_get_iommu_class;
>>>>> +       rte_eal_check_dma_mask;
>>>>>         rte_eal_has_pci;
>>>>>         rte_eal_iova_mode;
>>>>>         rte_eal_mbuf_default_mempool_ops;
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>>
>>>>
>>
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 4/6] mem: use address hint for mapping hugepages
  2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 4/6] mem: use address hint for mapping hugepages Alejandro Lucero
@ 2018-07-10 11:15   ` Eelco Chaudron
  0 siblings, 0 replies; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 11:15 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, stable, anatoly.burakov, maxime.coquelin, ferruh.yigit



On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:

> Linux kernel uses a really high address as starting address for
> serving mmaps calls. If there exists addressing limitations and
> IOVA mode is VA, this starting address is likely too high for
> those devices. However, it is possible to use a lower address in
> the process virtual address space as with 64 bits there is a lot
> of available space.
>
> This patch adds an address hint as starting address for 64 bits
> systems.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Looks good to me!

Cheers,

Eelco

Acked-by: Eelco Chaudron <echaudro@redhat.com>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 11:14           ` Eelco Chaudron
@ 2018-07-10 11:33             ` Burakov, Anatoly
  2018-07-10 11:43               ` Alejandro Lucero
  2018-07-10 11:40             ` Alejandro Lucero
  1 sibling, 1 reply; 62+ messages in thread
From: Burakov, Anatoly @ 2018-07-10 11:33 UTC (permalink / raw)
  To: Eelco Chaudron, Alejandro Lucero
  Cc: dev, stable, Maxime Coquelin, Ferruh Yigit

On 10-Jul-18 12:14 PM, Eelco Chaudron wrote:
> 
> 
> On 10 Jul 2018, at 12:52, Alejandro Lucero wrote:
> 
>> On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron <echaudro@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>>>
>>> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>>>
>>>>> A device can suffer addressing limitations. This functions checks
>>>>>
>>>>>> memsegs have iovas within the supported range based on dma mask.
>>>>>>
>>>>>> PMD should use this during initialization if supported devices
>>>>>> suffer addressing limitations, returning an error if this function
>>>>>> returns memsegs out of range.
>>>>>>
>>>>>> Another potential usage is for emulated IOMMU hardware with 
>>>>>> addressing
>>>>>> limitations.
>>>>>>
>>>>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>> ---
>>>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>>>> ++++++++++++++++++++++++++++++
>>>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>>>  3 files changed, 37 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>>>> index fc6c44d..f5efebe 100644
>>>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>>>> @@ -109,6 +109,39 @@
>>>>>>         }
>>>>>>  }
>>>>>>
>>>>>> +/* check memseg iovas are within the required range based on dma 
>>>>>> mask
>>>>>> */
>>>>>> +int
>>>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>>>> +{
>>>>>> +
>>>>>> +       const struct rte_mem_config *mcfg;
>>>>>> +       uint64_t mask;
>>>>>> +       int i;
>>>>>> +
>>>>>>
>>>>>>
>>>>> I think we should add some sanity check to the input maskbits, i.e.
>>>>> [64,0)
>>>>> or [64, 32]? What would be a reasonable lower bound.
>>>>>
>>>>>
>>>>> This is not a user's API, so any invocation will be reviewed, but I 
>>>>> guess
>>>> adding a sanity check here does not harm.
>>>>
>>>> Not sure about lower bound but upper should 64, although it does not 
>>>> make
>>>> sense but it is safe. Lower bound is not so problematic.
>>>>
>>>>
>>>>
>>>>> +       /* create dma mask */
>>>>>
>>>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>>>> +
>>>>>> +       /* get pointer to global configuration */
>>>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>>>> +
>>>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>>>> +                       break;
>>>>>>
>>>>>
>>> Looking at some other code, it looks like NULL entries might exists. So
>>> should a continue; rather than a break; be used here?
>>>
>>>
>> I do not think so. memsegs are allocated sequentially, so first with addr
>> as NULL implies no more memsegs.
> 
> I was referring to the mem walk functions, rte_memseg_list_walk(). Maybe 
> some having more experience with this area can review/comment.

Pre-18.05, all memsegs are allocated continuously. Memseg lists and 
memseg list walk functions are 18.05+.

Alejandro, perhaps it would be worth it to tag your patchset with 
"pre-18.05" to avoid similar confusion in the future?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 11:14           ` Eelco Chaudron
  2018-07-10 11:33             ` Burakov, Anatoly
@ 2018-07-10 11:40             ` Alejandro Lucero
  1 sibling, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10 11:40 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit

On Tue, Jul 10, 2018 at 12:14 PM, Eelco Chaudron <echaudro@redhat.com>
wrote:

>
>
> On 10 Jul 2018, at 12:52, Alejandro Lucero wrote:
>
> On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron <echaudro@redhat.com>
>> wrote:
>>
>>
>>>
>>> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>>>
>>> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com>
>>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>>>
>>>>> A device can suffer addressing limitations. This functions checks
>>>>>
>>>>> memsegs have iovas within the supported range based on dma mask.
>>>>>>
>>>>>> PMD should use this during initialization if supported devices
>>>>>> suffer addressing limitations, returning an error if this function
>>>>>> returns memsegs out of range.
>>>>>>
>>>>>> Another potential usage is for emulated IOMMU hardware with addressing
>>>>>> limitations.
>>>>>>
>>>>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>> ---
>>>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>>>> ++++++++++++++++++++++++++++++
>>>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>>>  3 files changed, 37 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>>>> index fc6c44d..f5efebe 100644
>>>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>>>> @@ -109,6 +109,39 @@
>>>>>>         }
>>>>>>  }
>>>>>>
>>>>>> +/* check memseg iovas are within the required range based on dma mask
>>>>>> */
>>>>>> +int
>>>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>>>> +{
>>>>>> +
>>>>>> +       const struct rte_mem_config *mcfg;
>>>>>> +       uint64_t mask;
>>>>>> +       int i;
>>>>>> +
>>>>>>
>>>>>>
>>>>>> I think we should add some sanity check to the input maskbits, i.e.
>>>>> [64,0)
>>>>> or [64, 32]? What would be a reasonable lower bound.
>>>>>
>>>>>
>>>>> This is not a user's API, so any invocation will be reviewed, but I
>>>>> guess
>>>>>
>>>> adding a sanity check here does not harm.
>>>>
>>>> Not sure about lower bound but upper should 64, although it does not
>>>> make
>>>> sense but it is safe. Lower bound is not so problematic.
>>>>
>>>>
>>>>
>>>> +       /* create dma mask */
>>>>>
>>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>>>> +
>>>>>> +       /* get pointer to global configuration */
>>>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>>>> +
>>>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>>>> +                       break;
>>>>>>
>>>>>>
>>>>> Looking at some other code, it looks like NULL entries might exists. So
>>> should a continue; rather than a break; be used here?
>>>
>>>
>>> I do not think so. memsegs are allocated sequentially, so first with addr
>> as NULL implies no more memsegs.
>>
>
> I was referring to the mem walk functions, rte_memseg_list_walk(). Maybe
> some having more experience with this area can review/comment.
>
>
This patchset applies to 17.11.3 which has not that function implemented.

You can see what rte_eal_get_physmem_size and rte_dump_physmem_layout do in
lib/librte_eal/common/eal_common_memory.c file regarding memseg "walks"
when addr is NULL.




>
>
>>
>>
>>> +
>>>
>>>> +               if (mcfg->memseg[i].iova & mask) {
>>>>>> +                       RTE_LOG(INFO, EAL,
>>>>>> +                               "memseg[%d] iova %"PRIx64" out of
>>>>>> range:\n",
>>>>>> +                               i, mcfg->memseg[i].iova);
>>>>>> +
>>>>>> +                       RTE_LOG(INFO, EAL, "\tusing dma mask
>>>>>> %"PRIx64"\n",
>>>>>> +                               mask);
>>>>>> +                       return -1;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>> +       return 0;
>>>>>> +}
>>>>>> +
>>>>>>  /* return the number of memory channels */
>>>>>>  unsigned rte_memory_get_nchannel(void)
>>>>>>  {
>>>>>> diff --git a/lib/librte_eal/common/include/rte_memory.h
>>>>>> b/lib/librte_eal/common/include/rte_memory.h
>>>>>> index 80a8fc0..b2a0168 100644
>>>>>> --- a/lib/librte_eal/common/include/rte_memory.h
>>>>>> +++ b/lib/librte_eal/common/include/rte_memory.h
>>>>>> @@ -209,6 +209,9 @@ struct rte_memseg {
>>>>>>   */
>>>>>>  unsigned rte_memory_get_nrank(void);
>>>>>>
>>>>>> +/* check memsegs iovas are within a range based on dma mask */
>>>>>> +int rte_eal_check_dma_mask(uint8_t maskbits);
>>>>>> +
>>>>>>  /**
>>>>>>   * Drivers based on uio will not load unless physical
>>>>>>   * addresses are obtainable. It is only possible to get
>>>>>> diff --git a/lib/librte_eal/rte_eal_version.map
>>>>>> b/lib/librte_eal/rte_eal_version.map
>>>>>> index f4f46c1..aa6cf87 100644
>>>>>> --- a/lib/librte_eal/rte_eal_version.map
>>>>>> +++ b/lib/librte_eal/rte_eal_version.map
>>>>>> @@ -184,6 +184,7 @@ DPDK_17.11 {
>>>>>>
>>>>>>         rte_eal_create_uio_dev;
>>>>>>         rte_bus_get_iommu_class;
>>>>>> +       rte_eal_check_dma_mask;
>>>>>>         rte_eal_has_pci;
>>>>>>         rte_eal_iova_mode;
>>>>>>         rte_eal_mbuf_default_mempool_ops;
>>>>>> --
>>>>>> 1.9.1
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 11:33             ` Burakov, Anatoly
@ 2018-07-10 11:43               ` Alejandro Lucero
  2018-07-10 11:55                 ` Eelco Chaudron
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10 11:43 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Eelco Chaudron, dev, stable, Maxime Coquelin, Ferruh Yigit

On Tue, Jul 10, 2018 at 12:33 PM, Burakov, Anatoly <
anatoly.burakov@intel.com> wrote:

> On 10-Jul-18 12:14 PM, Eelco Chaudron wrote:
>
>>
>>
>> On 10 Jul 2018, at 12:52, Alejandro Lucero wrote:
>>
>> On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron <echaudro@redhat.com>
>>> wrote:
>>>
>>>
>>>>
>>>> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>>>>
>>>> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron <echaudro@redhat.com>
>>>>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>>>>
>>>>>> A device can suffer addressing limitations. This functions checks
>>>>>>
>>>>>> memsegs have iovas within the supported range based on dma mask.
>>>>>>>
>>>>>>> PMD should use this during initialization if supported devices
>>>>>>> suffer addressing limitations, returning an error if this function
>>>>>>> returns memsegs out of range.
>>>>>>>
>>>>>>> Another potential usage is for emulated IOMMU hardware with
>>>>>>> addressing
>>>>>>> limitations.
>>>>>>>
>>>>>>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>>>>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>> ---
>>>>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>>>>> ++++++++++++++++++++++++++++++
>>>>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>>>>  3 files changed, 37 insertions(+)
>>>>>>>
>>>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>>>>> index fc6c44d..f5efebe 100644
>>>>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>>>>> @@ -109,6 +109,39 @@
>>>>>>>         }
>>>>>>>  }
>>>>>>>
>>>>>>> +/* check memseg iovas are within the required range based on dma
>>>>>>> mask
>>>>>>> */
>>>>>>> +int
>>>>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>>>>> +{
>>>>>>> +
>>>>>>> +       const struct rte_mem_config *mcfg;
>>>>>>> +       uint64_t mask;
>>>>>>> +       int i;
>>>>>>> +
>>>>>>>
>>>>>>>
>>>>>>> I think we should add some sanity check to the input maskbits, i.e.
>>>>>> [64,0)
>>>>>> or [64, 32]? What would be a reasonable lower bound.
>>>>>>
>>>>>>
>>>>>> This is not a user's API, so any invocation will be reviewed, but I
>>>>>> guess
>>>>>>
>>>>> adding a sanity check here does not harm.
>>>>>
>>>>> Not sure about lower bound but upper should 64, although it does not
>>>>> make
>>>>> sense but it is safe. Lower bound is not so problematic.
>>>>>
>>>>>
>>>>>
>>>>> +       /* create dma mask */
>>>>>>
>>>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>>>>> +
>>>>>>> +       /* get pointer to global configuration */
>>>>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>>>>> +
>>>>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>>>>> +                       break;
>>>>>>>
>>>>>>>
>>>>>> Looking at some other code, it looks like NULL entries might exists.
>>>> So
>>>> should a continue; rather than a break; be used here?
>>>>
>>>>
>>>> I do not think so. memsegs are allocated sequentially, so first with
>>> addr
>>> as NULL implies no more memsegs.
>>>
>>
>> I was referring to the mem walk functions, rte_memseg_list_walk(). Maybe
>> some having more experience with this area can review/comment.
>>
>
> Pre-18.05, all memsegs are allocated continuously. Memseg lists and memseg
> list walk functions are 18.05+.
>
> Alejandro, perhaps it would be worth it to tag your patchset with
> "pre-18.05" to avoid similar confusion in the future?
>
>
Yes, that will help. I'm sending a new version shortly and I'll make it
clear.



> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
  2018-07-10 11:43               ` Alejandro Lucero
@ 2018-07-10 11:55                 ` Eelco Chaudron
  0 siblings, 0 replies; 62+ messages in thread
From: Eelco Chaudron @ 2018-07-10 11:55 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Burakov, Anatoly, dev, stable, Maxime Coquelin, Ferruh Yigit



On 10 Jul 2018, at 13:43, Alejandro Lucero wrote:

> On Tue, Jul 10, 2018 at 12:33 PM, Burakov, Anatoly <
> anatoly.burakov@intel.com> wrote:
>
>> On 10-Jul-18 12:14 PM, Eelco Chaudron wrote:
>>
>>>
>>>
>>> On 10 Jul 2018, at 12:52, Alejandro Lucero wrote:
>>>
>>> On Tue, Jul 10, 2018 at 11:06 AM, Eelco Chaudron 
>>> <echaudro@redhat.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> On 10 Jul 2018, at 11:34, Alejandro Lucero wrote:
>>>>>
>>>>> On Tue, Jul 10, 2018 at 9:56 AM, Eelco Chaudron 
>>>>> <echaudro@redhat.com>
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>>>>>>>
>>>>>>> A device can suffer addressing limitations. This functions 
>>>>>>> checks
>>>>>>>
>>>>>>> memsegs have iovas within the supported range based on dma mask.
>>>>>>>>
>>>>>>>> PMD should use this during initialization if supported devices
>>>>>>>> suffer addressing limitations, returning an error if this 
>>>>>>>> function
>>>>>>>> returns memsegs out of range.
>>>>>>>>
>>>>>>>> Another potential usage is for emulated IOMMU hardware with
>>>>>>>> addressing
>>>>>>>> limitations.
>>>>>>>>
>>>>>>>> Signed-off-by: Alejandro Lucero 
>>>>>>>> <alejandro.lucero@netronome.com>
>>>>>>>> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>>> ---
>>>>>>>>  lib/librte_eal/common/eal_common_memory.c  | 33
>>>>>>>> ++++++++++++++++++++++++++++++
>>>>>>>>  lib/librte_eal/common/include/rte_memory.h |  3 +++
>>>>>>>>  lib/librte_eal/rte_eal_version.map         |  1 +
>>>>>>>>  3 files changed, 37 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_eal/common/eal_common_memory.c
>>>>>>>> b/lib/librte_eal/common/eal_common_memory.c
>>>>>>>> index fc6c44d..f5efebe 100644
>>>>>>>> --- a/lib/librte_eal/common/eal_common_memory.c
>>>>>>>> +++ b/lib/librte_eal/common/eal_common_memory.c
>>>>>>>> @@ -109,6 +109,39 @@
>>>>>>>>         }
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +/* check memseg iovas are within the required range based on 
>>>>>>>> dma
>>>>>>>> mask
>>>>>>>> */
>>>>>>>> +int
>>>>>>>> +rte_eal_check_dma_mask(uint8_t maskbits)
>>>>>>>> +{
>>>>>>>> +
>>>>>>>> +       const struct rte_mem_config *mcfg;
>>>>>>>> +       uint64_t mask;
>>>>>>>> +       int i;
>>>>>>>> +
>>>>>>>>
>>>>>>>>
>>>>>>>> I think we should add some sanity check to the input maskbits, 
>>>>>>>> i.e.
>>>>>>> [64,0)
>>>>>>> or [64, 32]? What would be a reasonable lower bound.
>>>>>>>
>>>>>>>
>>>>>>> This is not a user's API, so any invocation will be reviewed, 
>>>>>>> but I
>>>>>>> guess
>>>>>>>
>>>>>> adding a sanity check here does not harm.
>>>>>>
>>>>>> Not sure about lower bound but upper should 64, although it does 
>>>>>> not
>>>>>> make
>>>>>> sense but it is safe. Lower bound is not so problematic.
>>>>>>
>>>>>>
>>>>>>
>>>>>> +       /* create dma mask */
>>>>>>>
>>>>>>> +       mask = ~((1ULL << maskbits) - 1);
>>>>>>>> +
>>>>>>>> +       /* get pointer to global configuration */
>>>>>>>> +       mcfg = rte_eal_get_configuration()->mem_config;
>>>>>>>> +
>>>>>>>> +       for (i = 0; i < RTE_MAX_MEMSEG; i++) {
>>>>>>>> +               if (mcfg->memseg[i].addr == NULL)
>>>>>>>> +                       break;
>>>>>>>>
>>>>>>>>
>>>>>>> Looking at some other code, it looks like NULL entries might 
>>>>>>> exists.
>>>>> So
>>>>> should a continue; rather than a break; be used here?
>>>>>
>>>>>
>>>>> I do not think so. memsegs are allocated sequentially, so first 
>>>>> with
>>>> addr
>>>> as NULL implies no more memsegs.
>>>>
>>>
>>> I was referring to the mem walk functions, rte_memseg_list_walk(). 
>>> Maybe
>>> some having more experience with this area can review/comment.
>>>
>>
>> Pre-18.05, all memsegs are allocated continuously. Memseg lists and 
>> memseg
>> list walk functions are 18.05+.
>>
>> Alejandro, perhaps it would be worth it to tag your patchset with
>> "pre-18.05" to avoid similar confusion in the future?
>>
>>
> Yes, that will help. I'm sending a new version shortly and I'll make 
> it
> clear.

Thanks, I’ll review the new version if it’s ready before the end of 
tomorrow CET, as I will be on PTO.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode
  2018-07-10 10:14   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
@ 2018-07-10 15:37     ` Alejandro Lucero
  0 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-07-10 15:37 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: dev, stable, Burakov, Anatoly, Maxime Coquelin, Ferruh Yigit

On Tue, Jul 10, 2018 at 11:14 AM, Eelco Chaudron <echaudro@redhat.com>
wrote:

>
>
> On 4 Jul 2018, at 14:53, Alejandro Lucero wrote:
>
> Although VT-d emulation currently only supports 39 bits, it could
>> be iovas being within that supported range. This patch allows
>> IOVA mode in such a case.
>>
>> Indeed, memory initialization code can be modified for using lower
>> virtual addresses than those used by the kernel for 64 bits processes
>> by default, and therefore memsegs iovas can use 39 bits or less for
>> most system. And this is likely 100% true for VMs.
>>
>> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
>> ---
>>  drivers/bus/pci/linux/pci.c | 15 +++++++++++----
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
>> index 74deef3..792c819 100644
>> --- a/drivers/bus/pci/linux/pci.c
>> +++ b/drivers/bus/pci/linux/pci.c
>> @@ -43,6 +43,7 @@
>>  #include <rte_devargs.h>
>>  #include <rte_memcpy.h>
>>  #include <rte_vfio.h>
>> +#include <rte_memory.h>
>>
>>  #include "eal_private.h"
>>  #include "eal_filesystem.h"
>> @@ -613,10 +614,12 @@
>>         fclose(fp);
>>
>>         mgaw = ((vtd_cap_reg & VTD_CAP_MGAW_MASK) >> VTD_CAP_MGAW_SHIFT)
>> + 1;
>> -       if (mgaw < X86_VA_WIDTH)
>> +
>> +       if (!rte_eal_check_dma_mask(mgaw))
>>
>
> If think in this case we still need to check the X86_VA_WIDTH, i.e.
> if (mgaw < X86_VA_WIDTH && !rte_eal_check_dma_mask(mgaw))
>
>
> +               return true;
>> +       else
>>                 return false;
>>
>> -       return true;
>>  }
>>  #elif defined(RTE_ARCH_PPC_64)
>>  static bool
>> @@ -640,13 +643,17 @@
>>  {
>>         struct rte_pci_device *dev = NULL;
>>         struct rte_pci_driver *drv = NULL;
>> +       int iommu_dma_mask_check_done = 0;
>>
>>         FOREACH_DRIVER_ON_PCIBUS(drv) {
>>                 FOREACH_DEVICE_ON_PCIBUS(dev) {
>>                         if (!rte_pci_match(drv, dev))
>>                                 continue;
>> -                       if (!pci_one_device_iommu_support_va(dev))
>> -                               return false;
>> +                       if (!iommu_dma_mask_check_done) {
>> +                               if (pci_one_device_iommu_support_va(dev)
>> < 0)
>> +                                       return false;
>> +                               iommu_dma_mask_check_done  = 1;
>>
>
> Not sure why this change? Why do we only need to check one device on the
> bus?
>
>
Because there is just one emulated IOMMU hardware. The limitation in this
case is not in a specific PCI device. And I do not think it is possible to
have two different (emulated or not) IOMMU hardware. Yes, you can have more
than one controller but being same IOMMU type.


> In addition, if this is what was intended, rather than a variable you can
> return true in this case, or did you intended to clear the
> iommu_dma_mask_check_done on every PCI BUS iteration?
>
>
If pci_one_device_iommu_support_va, because the dma check, finds out the
IOVAs are out of range, then the IOVA mode is PA and no further checks are
required. But there could be a PCI device precluding the IOVA VA, so all
the PCI devices need to be processed.


> +                       }
>>                 }
>>         }
>>         return true;
>> --
>> 1.9.1
>>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 14:57                                         ` Alejandro Lucero
@ 2018-10-30 15:09                                           ` Lin, Xueqin
  0 siblings, 0 replies; 62+ messages in thread
From: Lin, Xueqin @ 2018-10-30 15:09 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Yao, Lei A, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Yigit, Ferruh, Zhang, Qi Z

Hi Lucero,

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Tuesday, October 30, 2018 10:57 PM
To: Lin, Xueqin <xueqin.lin@intel.com>
Cc: Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 2:45 PM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero,

The patch could fix both testpmd  and multi-process can’t setup issues on my environment.
Hope that you could upload fix patch to patches page in community.
Thanks a lot.


Great.

I need to format the patchset properly and clean things up but I hope I can send a patchset this week.

Thanks for testing!

By the way, is this testing something you are doing by yourself or it is part of Intel DPDK work?


We are from Intel DPDK validation team☺
It is 18.11 rc1 cycle, the issue block most of our cases can’t continue, include NIC, NIC VF, vhost/virtio, sample…
It is very urgent for us to check DPDK QA in very limit time.
Hope you could send fix patch officially soon, then merge to master branch after review.
Thanks.
Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 10:05 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>> wrote:

On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Some found on some our servers:
If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd  and secondary process.

If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1:  Fail to setup testpmd  and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.


It makes sense because the problem is when the IOVA mode is set inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to rte_eal_check_dma_mask at all.


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.


Hi Lin,

Can you try the patch attached?

Thanks

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0  0x00000000005566b5 in rte_fbarray_find_next_used ()
#1  0x000000000054da9c in rte_eal_check_dma_mask ()
#2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3  0x0000000000573988 in rte_pci_get_iommu_class ()
#4  0x000000000054f743 in rte_bus_get_iommu_class ()
#5  0x000000000053c123 in rte_eal_init ()
#6  0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1.       Setup primary process successfully

./hotplug_mp --proc-type=auto



2.       Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264             for (idx = first; idx < msk->n_masks; idx++) {

#0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

    used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

    arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

    at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

    at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6  0x00000000005b9738 in pci_devices_iommu_support_va ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7  0x00000000005b97a7 in rte_pci_get_iommu_class ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8  0x000000000058f1ce in rte_bus_get_iommu_class ()

    at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>> wrote:


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 14:45                                       ` Lin, Xueqin
@ 2018-10-30 14:57                                         ` Alejandro Lucero
  2018-10-30 15:09                                           ` Lin, Xueqin
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 14:57 UTC (permalink / raw)
  To: xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 2:45 PM Lin, Xueqin <xueqin.lin@intel.com> wrote:

> Hi Lucero,
>
>
>
> The patch could fix both testpmd  and multi-process can’t setup issues on
> my environment.
>
> Hope that you could upload fix patch to patches page in community.
>
> Thanks a lot.
>
>
>

Great.

I need to format the patchset properly and clean things up but I hope I can
send a patchset this week.

Thanks for testing!

By the way, is this testing something you are doing by yourself or it is
part of Intel DPDK work?



> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 10:05 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <
> alejandro.lucero@netronome.com> wrote:
>
>
>
> On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Some found on some our servers:
>
> If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
> reboot to make it effective.
>
> 18.11 rc1: Success to setup testpmd  and secondary process.
>
>
>
> If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
> reboot to make it effective.
>
> 18.11 rc1:  Fail to setup testpmd  and secondary process.
>
> 18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup
> secondary process.
>
>
>
> Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
>
> Most of our team servers should enable the IOMMU for VT-d and vfio test.
>
>
>
>
>
> It makes sense because the problem is when the IOVA mode is set inside
> drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
> rte_eal_check_dma_mask at all.
>
>
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 6:38 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Hi Lucero,
>
>
>
> No, we have reproduced multi-process issues(include symmetric_mp,
> simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
>
> It is also strange that 1~2 servers don’t have the issue.
>
>
>
>
>
> Yes, you are right. I could execute it but it was due to how this problem
> triggers.
>
> I think I can fix this and at the same time solving properly the initial
> issue without any limitation like that potential race condition I
> mentioned.
>
> I can give you a patch to try in a couple of hours.
>
>
>
>
>
> Hi Lin,
>
>
>
> Can you try the patch attached?
>
>
>
> Thanks
>
>
>
> Thanks
>
>
>
> Bind two NNT ports or FVL ports
>
>
>
> ./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4
> --proc-id=1
>
>
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> [New Thread 0x7ffff6eda700 (LWP 90103)]
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
>
> [New Thread 0x7ffff66d9700 (LWP 90104)]
>
>
>
> Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> (gdb) bt
>
> #0  0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> #1  0x000000000054da9c in rte_eal_check_dma_mask ()
>
> #2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
>
> #3  0x0000000000573988 in rte_pci_get_iommu_class ()
>
> #4  0x000000000054f743 in rte_bus_get_iommu_class ()
>
> #5  0x000000000053c123 in rte_eal_init ()
>
> #6  0x000000000046be2b in main ()
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 5:41 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Hi Lucero&Thomas,
>
>
>
> Find the patch can’t fix multi-process cases.
>
>
>
> Hi,
>
>
>
> I think it is not specifically about multiprocess but about hotplug with
> multiprocess because I can execute the symmetric_mp successfully with a
> secondary process.
>
>
>
> Working on this as a priority.
>
>
>
> Thanks.
>
>
>
> Steps:
>
> 1.       Setup primary process successfully
>
> ./hotplug_mp --proc-type=auto
>
>
>
> 2.       Fail to setup secondary process
>
> ./hotplug_mp --proc-type=auto
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
>
> Segmentation fault (core dumped)
>
>
>
> More information as below:
>
> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> 264             for (idx = first; idx < msk->n_masks; idx++) {
>
> #0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0,
> used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> #1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0,
> next=true,
>
>     used=true) at
> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
>
> #2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4,
> start=0)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
>
> #3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401
> <check_iova>,
>
>     arg=0x7fffffffcc38) at
> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
>
> #4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
>
> #5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:593
>
> #6  0x00000000005b9738 in pci_devices_iommu_support_va ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:626
>
> #7  0x00000000005b97a7 in rte_pci_get_iommu_class ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:650
>
> #8  0x000000000058f1ce in rte_bus_get_iommu_class ()
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
>
> #9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
>
> #10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
>
>
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 9:41 PM
> *To:* Yao, Lei A <lei.a.yao@intel.com>
> *Cc:* Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian
> Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>
>
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 8:56 PM
> *To:* Thomas Monjalon <thomas@monjalon.net>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the
> deadlock.
>
> The deadlock is a bigger concern I think.
>
>
>
> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>
>
>
> Yao, can you try with the attached patch?
>
>
>
> Hi, Lucero
>
>
>
> This patch can fix the issue at my side. Thanks a lot
>
> for you quick action.
>
>
>
>
>
> Great!
>
>
>
> I will send an official patch with the changes.
>
>
>
> I have to say that I tested the patchset, but I think it was where
> legacy_mem was still there and therefore dynamic memory allocation code not
> used during memory initialization.
>
>
>
> There is something that concerns me though. Using
> rte_memseg_walk_thread_unsafe could be a problem under some situations
> although those situations being unlikely.
>
>
>
> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then it is safe to use the unsafe function for walking memsegs, but with
> device hotplug and dynamic memory allocation, there exists a potential race
> condition when the primary process is allocating more memory and
> concurrently a device is hotplugged and a secondary process does the device
> initialization. By now, this is just a problem with the NFP, and the
> potential race condition window really unlikely, but I will work on this
> asap.
>
>
>
> BRs
>
> Lei
>
>
>
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> but if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 14:04                                     ` Alejandro Lucero
  2018-10-30 14:14                                       ` Burakov, Anatoly
@ 2018-10-30 14:45                                       ` Lin, Xueqin
  2018-10-30 14:57                                         ` Alejandro Lucero
  1 sibling, 1 reply; 62+ messages in thread
From: Lin, Xueqin @ 2018-10-30 14:45 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Yao, Lei A, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Yigit, Ferruh, Zhang, Qi Z

Hi Lucero,

The patch could fix both testpmd  and multi-process can’t setup issues on my environment.
Hope that you could upload fix patch to patches page in community.
Thanks a lot.

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Tuesday, October 30, 2018 10:05 PM
To: Lin, Xueqin <xueqin.lin@intel.com>
Cc: Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>> wrote:

On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Some found on some our servers:
If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd  and secondary process.

If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1:  Fail to setup testpmd  and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.


It makes sense because the problem is when the IOVA mode is set inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to rte_eal_check_dma_mask at all.


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.


Hi Lin,

Can you try the patch attached?

Thanks

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0  0x00000000005566b5 in rte_fbarray_find_next_used ()
#1  0x000000000054da9c in rte_eal_check_dma_mask ()
#2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3  0x0000000000573988 in rte_pci_get_iommu_class ()
#4  0x000000000054f743 in rte_bus_get_iommu_class ()
#5  0x000000000053c123 in rte_eal_init ()
#6  0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1.       Setup primary process successfully

./hotplug_mp --proc-type=auto



2.       Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264             for (idx = first; idx < msk->n_masks; idx++) {

#0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

    used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

    arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

    at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

    at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6  0x00000000005b9738 in pci_devices_iommu_support_va ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7  0x00000000005b97a7 in rte_pci_get_iommu_class ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8  0x000000000058f1ce in rte_bus_get_iommu_class ()

    at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>> wrote:


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 14:14                                       ` Burakov, Anatoly
@ 2018-10-30 14:45                                         ` Alejandro Lucero
  0 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 14:45 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: xueqin.lin, lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 2:14 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 30-Oct-18 2:04 PM, Alejandro Lucero wrote:
> >
> >
> > On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero
> > <alejandro.lucero@netronome.com <mailto:alejandro.lucero@netronome.com>>
>
> > wrote:
> >
> >
> >
> >     On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com
> >     <mailto:xueqin.lin@intel.com>> wrote:
> >
> >         Some found on some our servers:____
> >
> >         If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg
> >         file, then reboot to make it effective.____
> >
> >         18.11 rc1: Success to setup testpmd  and secondary process.____
> >
> >         __ __
> >
> >         If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file,
> >         then reboot to make it effective.____
> >
> >         18.11 rc1:  Fail to setup testpmd  and secondary process.____
> >
> >         18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but
> >         fail to setup secondary process.____
> >
> >         __ __
> >
> >         Maybe ”intel_iommu=on iommu=pt” enable or not result in our test
> >         gap. ____
> >
> >         Most of our team servers should enable the IOMMU for VT-d and
> >         vfio test. ____
> >
> >         __
> >
> >
> >     It makes sense because the problem is when the IOVA mode is set
> >     inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not
> >     call to rte_eal_check_dma_mask at all.
> >
> >         __
> >
> >         Best regards,____
> >
> >         Xueqin____
> >
> >         __ __
> >
> >         *From:*Alejandro Lucero [mailto:alejandro.lucero@netronome.com
> >         <mailto:alejandro.lucero@netronome.com>]
> >         *Sent:* Tuesday, October 30, 2018 6:38 PM
> >         *To:* Lin, Xueqin <xueqin.lin@intel.com
> >         <mailto:xueqin.lin@intel.com>>
> >         *Cc:* Yao, Lei A <lei.a.yao@intel.com
> >         <mailto:lei.a.yao@intel.com>>; Thomas Monjalon
> >         <thomas@monjalon.net <mailto:thomas@monjalon.net>>; dev
> >         <dev@dpdk.org <mailto:dev@dpdk.org>>; Xu, Qian Q
> >         <qian.q.xu@intel.com <mailto:qian.q.xu@intel.com>>; Burakov,
> >         Anatoly <anatoly.burakov@intel.com
> >         <mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh
> >         <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>>; Zhang,
> >         Qi Z <qi.z.zhang@intel.com <mailto:qi.z.zhang@intel.com>>
> >         *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based
> >         on DMA mask____
> >
> >         __ __
> >
> >         __ __
> >
> >         On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin
> >         <xueqin.lin@intel.com <mailto:xueqin.lin@intel.com>> wrote:____
> >
> >             Hi Lucero,____
> >
> >             ____
> >
> >             No, we have reproduced multi-process issues(include
> >             symmetric_mp, simple_mp, hotplug_mp, multi-process unit
> >             test… )on most of our servers. ____
> >
> >             It is also strange that 1~2 servers don’t have the issue.____
> >
> >             ____
> >
> >         __ __
> >
> >         Yes, you are right. I could execute it but it was due to how
> >         this problem triggers. ____
> >
> >         I think I can fix this and at the same time solving properly the
> >         initial issue without any limitation like that potential race
> >         condition I mentioned. ____
> >
> >         I can give you a patch to try in a couple of hours. ____
> >
> >         __
> >
> >
> > Hi Lin,
> >
> > Can you try the patch attached?
> >
> > Thanks
> >
> Hi Alejandro,
>
> Attachments are not supported on the mailing list :)
>

Apologies. I should have sent it just to Lin.


>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 14:04                                     ` Alejandro Lucero
@ 2018-10-30 14:14                                       ` Burakov, Anatoly
  2018-10-30 14:45                                         ` Alejandro Lucero
  2018-10-30 14:45                                       ` Lin, Xueqin
  1 sibling, 1 reply; 62+ messages in thread
From: Burakov, Anatoly @ 2018-10-30 14:14 UTC (permalink / raw)
  To: Alejandro Lucero, xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Ferruh Yigit, Qi Zhang

On 30-Oct-18 2:04 PM, Alejandro Lucero wrote:
> 
> 
> On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero 
> <alejandro.lucero@netronome.com <mailto:alejandro.lucero@netronome.com>> 
> wrote:
> 
> 
> 
>     On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com
>     <mailto:xueqin.lin@intel.com>> wrote:
> 
>         Some found on some our servers:____
> 
>         If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg
>         file, then reboot to make it effective.____
> 
>         18.11 rc1: Success to setup testpmd  and secondary process.____
> 
>         __ __
> 
>         If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file,
>         then reboot to make it effective.____
> 
>         18.11 rc1:  Fail to setup testpmd  and secondary process.____
> 
>         18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but
>         fail to setup secondary process.____
> 
>         __ __
> 
>         Maybe ”intel_iommu=on iommu=pt” enable or not result in our test
>         gap. ____
> 
>         Most of our team servers should enable the IOMMU for VT-d and
>         vfio test. ____
> 
>         __
> 
> 
>     It makes sense because the problem is when the IOVA mode is set
>     inside drivers/bus/pci/linux/pci.c and if there is not IOMMU, not
>     call to rte_eal_check_dma_mask at all.
> 
>         __
> 
>         Best regards,____
> 
>         Xueqin____
> 
>         __ __
> 
>         *From:*Alejandro Lucero [mailto:alejandro.lucero@netronome.com
>         <mailto:alejandro.lucero@netronome.com>]
>         *Sent:* Tuesday, October 30, 2018 6:38 PM
>         *To:* Lin, Xueqin <xueqin.lin@intel.com
>         <mailto:xueqin.lin@intel.com>>
>         *Cc:* Yao, Lei A <lei.a.yao@intel.com
>         <mailto:lei.a.yao@intel.com>>; Thomas Monjalon
>         <thomas@monjalon.net <mailto:thomas@monjalon.net>>; dev
>         <dev@dpdk.org <mailto:dev@dpdk.org>>; Xu, Qian Q
>         <qian.q.xu@intel.com <mailto:qian.q.xu@intel.com>>; Burakov,
>         Anatoly <anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh
>         <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>>; Zhang,
>         Qi Z <qi.z.zhang@intel.com <mailto:qi.z.zhang@intel.com>>
>         *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based
>         on DMA mask____
> 
>         __ __
> 
>         __ __
> 
>         On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin
>         <xueqin.lin@intel.com <mailto:xueqin.lin@intel.com>> wrote:____
> 
>             Hi Lucero,____
> 
>             ____
> 
>             No, we have reproduced multi-process issues(include
>             symmetric_mp, simple_mp, hotplug_mp, multi-process unit
>             test… )on most of our servers. ____
> 
>             It is also strange that 1~2 servers don’t have the issue.____
> 
>             ____
> 
>         __ __
> 
>         Yes, you are right. I could execute it but it was due to how
>         this problem triggers. ____
> 
>         I think I can fix this and at the same time solving properly the
>         initial issue without any limitation like that potential race
>         condition I mentioned. ____
> 
>         I can give you a patch to try in a couple of hours. ____
> 
>         __
> 
> 
> Hi Lin,
> 
> Can you try the patch attached?
> 
> Thanks
> 
Hi Alejandro,

Attachments are not supported on the mailing list :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 12:37                                   ` Alejandro Lucero
@ 2018-10-30 14:04                                     ` Alejandro Lucero
  2018-10-30 14:14                                       ` Burakov, Anatoly
  2018-10-30 14:45                                       ` Lin, Xueqin
  0 siblings, 2 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 14:04 UTC (permalink / raw)
  To: xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 12:37 PM Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:

>
>
> On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
>> Some found on some our servers:
>>
>> If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
>> reboot to make it effective.
>>
>> 18.11 rc1: Success to setup testpmd  and secondary process.
>>
>>
>>
>> If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
>> reboot to make it effective.
>>
>> 18.11 rc1:  Fail to setup testpmd  and secondary process.
>>
>> 18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to
>> setup secondary process.
>>
>>
>>
>> Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
>>
>> Most of our team servers should enable the IOMMU for VT-d and vfio test.
>>
>>
>>
>
> It makes sense because the problem is when the IOVA mode is set inside
> drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
> rte_eal_check_dma_mask at all.
>
>
>
>> Best regards,
>>
>> Xueqin
>>
>>
>>
>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>> *Sent:* Tuesday, October 30, 2018 6:38 PM
>> *To:* Lin, Xueqin <xueqin.lin@intel.com>
>> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
>> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
>> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
>> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
>> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
>> mask
>>
>>
>>
>>
>>
>> On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com>
>> wrote:
>>
>> Hi Lucero,
>>
>>
>>
>> No, we have reproduced multi-process issues(include symmetric_mp,
>> simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
>>
>> It is also strange that 1~2 servers don’t have the issue.
>>
>>
>>
>>
>>
>> Yes, you are right. I could execute it but it was due to how this problem
>> triggers.
>>
>> I think I can fix this and at the same time solving properly the initial
>> issue without any limitation like that potential race condition I
>> mentioned.
>>
>> I can give you a patch to try in a couple of hours.
>>
>>
>>
>
Hi Lin,

Can you try the patch attached?

Thanks


> Thanks
>>
>>
>>
>> Bind two NNT ports or FVL ports
>>
>>
>>
>> ./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4
>> --proc-id=1
>>
>>
>>
>> EAL: Detected 88 lcore(s)
>>
>> EAL: Detected 2 NUMA nodes
>>
>> EAL: Auto-detected process type: SECONDARY
>>
>> [New Thread 0x7ffff6eda700 (LWP 90103)]
>>
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
>>
>> [New Thread 0x7ffff66d9700 (LWP 90104)]
>>
>>
>>
>> Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
>>
>> 0x00000000005566b5 in rte_fbarray_find_next_used ()
>>
>> (gdb) bt
>>
>> #0  0x00000000005566b5 in rte_fbarray_find_next_used ()
>>
>> #1  0x000000000054da9c in rte_eal_check_dma_mask ()
>>
>> #2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
>>
>> #3  0x0000000000573988 in rte_pci_get_iommu_class ()
>>
>> #4  0x000000000054f743 in rte_bus_get_iommu_class ()
>>
>> #5  0x000000000053c123 in rte_eal_init ()
>>
>> #6  0x000000000046be2b in main ()
>>
>>
>>
>> Best regards,
>>
>> Xueqin
>>
>>
>>
>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>> *Sent:* Tuesday, October 30, 2018 5:41 PM
>> *To:* Lin, Xueqin <xueqin.lin@intel.com>
>> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
>> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
>> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
>> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
>> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
>> mask
>>
>>
>>
>>
>>
>> On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>>
>> Hi Lucero&Thomas,
>>
>>
>>
>> Find the patch can’t fix multi-process cases.
>>
>>
>>
>> Hi,
>>
>>
>>
>> I think it is not specifically about multiprocess but about hotplug with
>> multiprocess because I can execute the symmetric_mp successfully with a
>> secondary process.
>>
>>
>>
>> Working on this as a priority.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Steps:
>>
>> 1.       Setup primary process successfully
>>
>> ./hotplug_mp --proc-type=auto
>>
>>
>>
>> 2.       Fail to setup secondary process
>>
>> ./hotplug_mp --proc-type=auto
>>
>> EAL: Detected 88 lcore(s)
>>
>> EAL: Detected 2 NUMA nodes
>>
>> EAL: Auto-detected process type: SECONDARY
>>
>> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
>>
>> Segmentation fault (core dumped)
>>
>>
>>
>> More information as below:
>>
>> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
>>
>> 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
>>
>>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>>
>> 264             for (idx = first; idx < msk->n_masks; idx++) {
>>
>> #0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0,
>> used=true)
>>
>>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>>
>> #1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0,
>> next=true,
>>
>>     used=true) at
>> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
>>
>> #2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4,
>> start=0)
>>
>>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
>>
>> #3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401
>> <check_iova>,
>>
>>     arg=0x7fffffffcc38) at
>> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
>>
>> #4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
>>
>>     at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
>>
>> #5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
>>
>>     at /root/dpdk/drivers/bus/pci/linux/pci.c:593
>>
>> #6  0x00000000005b9738 in pci_devices_iommu_support_va ()
>>
>>     at /root/dpdk/drivers/bus/pci/linux/pci.c:626
>>
>> #7  0x00000000005b97a7 in rte_pci_get_iommu_class ()
>>
>>     at /root/dpdk/drivers/bus/pci/linux/pci.c:650
>>
>> #8  0x000000000058f1ce in rte_bus_get_iommu_class ()
>>
>>     at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
>>
>> #9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
>>
>>     at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
>>
>> #10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
>>
>>     at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
>>
>>
>>
>>
>>
>> Best regards,
>>
>> Xueqin
>>
>>
>>
>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>> *Sent:* Monday, October 29, 2018 9:41 PM
>> *To:* Yao, Lei A <lei.a.yao@intel.com>
>> *Cc:* Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu,
>> Qian Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
>> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
>> ferruh.yigit@intel.com>
>> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
>> mask
>>
>>
>>
>>
>>
>> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>>
>>
>>
>>
>>
>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>> *Sent:* Monday, October 29, 2018 8:56 PM
>> *To:* Thomas Monjalon <thomas@monjalon.net>
>> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
>> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
>> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
>> ferruh.yigit@intel.com>
>> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
>> mask
>>
>>
>>
>>
>>
>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
>> wrote:
>>
>> 29/10/2018 12:39, Alejandro Lucero:
>> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
>> > mask instead of the maskbits. However, this does not solves the
>> deadlock.
>>
>> The deadlock is a bigger concern I think.
>>
>>
>>
>> I think once the call to rte_eal_check_dma_mask uses the maskbits
>> instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the
>> deadlock.
>>
>>
>>
>> Yao, can you try with the attached patch?
>>
>>
>>
>> Hi, Lucero
>>
>>
>>
>> This patch can fix the issue at my side. Thanks a lot
>>
>> for you quick action.
>>
>>
>>
>>
>>
>> Great!
>>
>>
>>
>> I will send an official patch with the changes.
>>
>>
>>
>> I have to say that I tested the patchset, but I think it was where
>> legacy_mem was still there and therefore dynamic memory allocation code not
>> used during memory initialization.
>>
>>
>>
>> There is something that concerns me though. Using
>> rte_memseg_walk_thread_unsafe could be a problem under some situations
>> although those situations being unlikely.
>>
>>
>>
>> Usually, calling rte_eal_check_dma_mask happens during initialization.
>> Then it is safe to use the unsafe function for walking memsegs, but with
>> device hotplug and dynamic memory allocation, there exists a potential race
>> condition when the primary process is allocating more memory and
>> concurrently a device is hotplugged and a secondary process does the device
>> initialization. By now, this is just a problem with the NFP, and the
>> potential race condition window really unlikely, but I will work on this
>> asap.
>>
>>
>>
>> BRs
>>
>> Lei
>>
>>
>>
>> > Interestingly, the problem looks like a compiler one. Calling
>> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
>> but if
>> > you modify the call like this:
>> >
>> > -       if (rte_memseg_walk(check_iova, &mask))
>> > +       if (!rte_memseg_walk(check_iova, &mask))
>> >
>> > it works, although the value returned to the invoker changes, of course.
>> > But the point here is it should be the same behaviour when calling
>> > rte_memseg_walk than before and it is not.
>>
>> Anyway, the coding style requires to save the return value in a variable,
>> instead of nesting the call in an "if" condition.
>> And the "if" check should be explicitly != 0 because it is not a real
>> boolean.
>>
>> PS: please do not top post and avoid HTML emails, thanks
>>
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 12:21                                 ` Lin, Xueqin
@ 2018-10-30 12:37                                   ` Alejandro Lucero
  2018-10-30 14:04                                     ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 12:37 UTC (permalink / raw)
  To: xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 12:22 PM Lin, Xueqin <xueqin.lin@intel.com> wrote:

> Some found on some our servers:
>
> If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
> reboot to make it effective.
>
> 18.11 rc1: Success to setup testpmd  and secondary process.
>
>
>
> If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then
> reboot to make it effective.
>
> 18.11 rc1:  Fail to setup testpmd  and secondary process.
>
> 18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup
> secondary process.
>
>
>
> Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
>
> Most of our team servers should enable the IOMMU for VT-d and vfio test.
>
>
>

It makes sense because the problem is when the IOVA mode is set inside
drivers/bus/pci/linux/pci.c and if there is not IOMMU, not call to
rte_eal_check_dma_mask at all.



> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 6:38 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Hi Lucero,
>
>
>
> No, we have reproduced multi-process issues(include symmetric_mp,
> simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
>
> It is also strange that 1~2 servers don’t have the issue.
>
>
>
>
>
> Yes, you are right. I could execute it but it was due to how this problem
> triggers.
>
> I think I can fix this and at the same time solving properly the initial
> issue without any limitation like that potential race condition I
> mentioned.
>
> I can give you a patch to try in a couple of hours.
>
>
>
> Thanks
>
>
>
> Bind two NNT ports or FVL ports
>
>
>
> ./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4
> --proc-id=1
>
>
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> [New Thread 0x7ffff6eda700 (LWP 90103)]
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
>
> [New Thread 0x7ffff66d9700 (LWP 90104)]
>
>
>
> Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> (gdb) bt
>
> #0  0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> #1  0x000000000054da9c in rte_eal_check_dma_mask ()
>
> #2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
>
> #3  0x0000000000573988 in rte_pci_get_iommu_class ()
>
> #4  0x000000000054f743 in rte_bus_get_iommu_class ()
>
> #5  0x000000000053c123 in rte_eal_init ()
>
> #6  0x000000000046be2b in main ()
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 5:41 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Hi Lucero&Thomas,
>
>
>
> Find the patch can’t fix multi-process cases.
>
>
>
> Hi,
>
>
>
> I think it is not specifically about multiprocess but about hotplug with
> multiprocess because I can execute the symmetric_mp successfully with a
> secondary process.
>
>
>
> Working on this as a priority.
>
>
>
> Thanks.
>
>
>
> Steps:
>
> 1.       Setup primary process successfully
>
> ./hotplug_mp --proc-type=auto
>
>
>
> 2.       Fail to setup secondary process
>
> ./hotplug_mp --proc-type=auto
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
>
> Segmentation fault (core dumped)
>
>
>
> More information as below:
>
> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> 264             for (idx = first; idx < msk->n_masks; idx++) {
>
> #0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0,
> used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> #1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0,
> next=true,
>
>     used=true) at
> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
>
> #2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4,
> start=0)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
>
> #3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401
> <check_iova>,
>
>     arg=0x7fffffffcc38) at
> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
>
> #4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
>
> #5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:593
>
> #6  0x00000000005b9738 in pci_devices_iommu_support_va ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:626
>
> #7  0x00000000005b97a7 in rte_pci_get_iommu_class ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:650
>
> #8  0x000000000058f1ce in rte_bus_get_iommu_class ()
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
>
> #9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
>
> #10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
>
>
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 9:41 PM
> *To:* Yao, Lei A <lei.a.yao@intel.com>
> *Cc:* Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian
> Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>
>
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 8:56 PM
> *To:* Thomas Monjalon <thomas@monjalon.net>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the
> deadlock.
>
> The deadlock is a bigger concern I think.
>
>
>
> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>
>
>
> Yao, can you try with the attached patch?
>
>
>
> Hi, Lucero
>
>
>
> This patch can fix the issue at my side. Thanks a lot
>
> for you quick action.
>
>
>
>
>
> Great!
>
>
>
> I will send an official patch with the changes.
>
>
>
> I have to say that I tested the patchset, but I think it was where
> legacy_mem was still there and therefore dynamic memory allocation code not
> used during memory initialization.
>
>
>
> There is something that concerns me though. Using
> rte_memseg_walk_thread_unsafe could be a problem under some situations
> although those situations being unlikely.
>
>
>
> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then it is safe to use the unsafe function for walking memsegs, but with
> device hotplug and dynamic memory allocation, there exists a potential race
> condition when the primary process is allocating more memory and
> concurrently a device is hotplugged and a secondary process does the device
> initialization. By now, this is just a problem with the NFP, and the
> potential race condition window really unlikely, but I will work on this
> asap.
>
>
>
> BRs
>
> Lei
>
>
>
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> but if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 10:38                               ` Alejandro Lucero
@ 2018-10-30 12:21                                 ` Lin, Xueqin
  2018-10-30 12:37                                   ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Lin, Xueqin @ 2018-10-30 12:21 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Yao, Lei A, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Yigit, Ferruh, Zhang, Qi Z

Some found on some our servers:
If  not add ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1: Success to setup testpmd  and secondary process.

If  add  ”intel_iommu=on iommu=pt” in /boot/grub2/grub.cfg file, then reboot to make it effective.
18.11 rc1:  Fail to setup testpmd  and secondary process.
18.11 rc1+ dma_mask_fix patch: success to setup testpmd, but fail to setup secondary process.

Maybe ”intel_iommu=on iommu=pt” enable or not result in our test gap.
Most of our team servers should enable the IOMMU for VT-d and vfio test.

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Tuesday, October 30, 2018 6:38 PM
To: Lin, Xueqin <xueqin.lin@intel.com>
Cc: Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.


Yes, you are right. I could execute it but it was due to how this problem triggers.
I think I can fix this and at the same time solving properly the initial issue without any limitation like that potential race condition I mentioned.
I can give you a patch to try in a couple of hours.

Thanks

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0  0x00000000005566b5 in rte_fbarray_find_next_used ()
#1  0x000000000054da9c in rte_eal_check_dma_mask ()
#2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3  0x0000000000573988 in rte_pci_get_iommu_class ()
#4  0x000000000054f743 in rte_bus_get_iommu_class ()
#5  0x000000000053c123 in rte_eal_init ()
#6  0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Zhang, Qi Z <qi.z.zhang@intel.com<mailto:qi.z.zhang@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1.       Setup primary process successfully

./hotplug_mp --proc-type=auto



2.       Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264             for (idx = first; idx < msk->n_masks; idx++) {

#0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

    used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

    arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

    at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

    at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6  0x00000000005b9738 in pci_devices_iommu_support_va ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7  0x00000000005b97a7 in rte_pci_get_iommu_class ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8  0x000000000058f1ce in rte_bus_get_iommu_class ()

    at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>> wrote:


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 10:33                             ` Lin, Xueqin
@ 2018-10-30 10:38                               ` Alejandro Lucero
  2018-10-30 12:21                                 ` Lin, Xueqin
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 10:38 UTC (permalink / raw)
  To: xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 10:34 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:

> Hi Lucero,
>
>
>
> No, we have reproduced multi-process issues(include symmetric_mp,
> simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
>
> It is also strange that 1~2 servers don’t have the issue.
>
>
>

Yes, you are right. I could execute it but it was due to how this problem
triggers.
I think I can fix this and at the same time solving properly the initial
issue without any limitation like that potential race condition I
mentioned.
I can give you a patch to try in a couple of hours.

Thanks


> Bind two NNT ports or FVL ports
>
>
>
> ./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4
> --proc-id=1
>
>
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> [New Thread 0x7ffff6eda700 (LWP 90103)]
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
>
> [New Thread 0x7ffff66d9700 (LWP 90104)]
>
>
>
> Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> (gdb) bt
>
> #0  0x00000000005566b5 in rte_fbarray_find_next_used ()
>
> #1  0x000000000054da9c in rte_eal_check_dma_mask ()
>
> #2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
>
> #3  0x0000000000573988 in rte_pci_get_iommu_class ()
>
> #4  0x000000000054f743 in rte_bus_get_iommu_class ()
>
> #5  0x000000000053c123 in rte_eal_init ()
>
> #6  0x000000000046be2b in main ()
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Tuesday, October 30, 2018 5:41 PM
> *To:* Lin, Xueqin <xueqin.lin@intel.com>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <
> thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <
> ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:
>
> Hi Lucero&Thomas,
>
>
>
> Find the patch can’t fix multi-process cases.
>
>
>
> Hi,
>
>
>
> I think it is not specifically about multiprocess but about hotplug with
> multiprocess because I can execute the symmetric_mp successfully with a
> secondary process.
>
>
>
> Working on this as a priority.
>
>
>
> Thanks.
>
>
>
> Steps:
>
> 1.       Setup primary process successfully
>
> ./hotplug_mp --proc-type=auto
>
>
>
> 2.       Fail to setup secondary process
>
> ./hotplug_mp --proc-type=auto
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
>
> Segmentation fault (core dumped)
>
>
>
> More information as below:
>
> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> 264             for (idx = first; idx < msk->n_masks; idx++) {
>
> #0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0,
> used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> #1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0,
> next=true,
>
>     used=true) at
> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
>
> #2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4,
> start=0)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
>
> #3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401
> <check_iova>,
>
>     arg=0x7fffffffcc38) at
> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
>
> #4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
>
> #5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:593
>
> #6  0x00000000005b9738 in pci_devices_iommu_support_va ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:626
>
> #7  0x00000000005b97a7 in rte_pci_get_iommu_class ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:650
>
> #8  0x000000000058f1ce in rte_bus_get_iommu_class ()
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
>
> #9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
>
> #10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
>
>
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 9:41 PM
> *To:* Yao, Lei A <lei.a.yao@intel.com>
> *Cc:* Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian
> Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>
>
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 8:56 PM
> *To:* Thomas Monjalon <thomas@monjalon.net>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the
> deadlock.
>
> The deadlock is a bigger concern I think.
>
>
>
> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>
>
>
> Yao, can you try with the attached patch?
>
>
>
> Hi, Lucero
>
>
>
> This patch can fix the issue at my side. Thanks a lot
>
> for you quick action.
>
>
>
>
>
> Great!
>
>
>
> I will send an official patch with the changes.
>
>
>
> I have to say that I tested the patchset, but I think it was where
> legacy_mem was still there and therefore dynamic memory allocation code not
> used during memory initialization.
>
>
>
> There is something that concerns me though. Using
> rte_memseg_walk_thread_unsafe could be a problem under some situations
> although those situations being unlikely.
>
>
>
> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then it is safe to use the unsafe function for walking memsegs, but with
> device hotplug and dynamic memory allocation, there exists a potential race
> condition when the primary process is allocating more memory and
> concurrently a device is hotplugged and a secondary process does the device
> initialization. By now, this is just a problem with the NFP, and the
> potential race condition window really unlikely, but I will work on this
> asap.
>
>
>
> BRs
>
> Lei
>
>
>
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> but if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30  9:41                           ` Alejandro Lucero
@ 2018-10-30 10:33                             ` Lin, Xueqin
  2018-10-30 10:38                               ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Lin, Xueqin @ 2018-10-30 10:33 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: Yao, Lei A, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Yigit, Ferruh, Zhang, Qi Z

Hi Lucero,

No, we have reproduced multi-process issues(include symmetric_mp, simple_mp, hotplug_mp, multi-process unit test… )on most of our servers.
It is also strange that 1~2 servers don’t have the issue.

Bind two NNT ports or FVL ports

./build/symmetric_mp -c 4 --proc-type=auto -- -p 3 --num-procs=4 --proc-id=1

EAL: Detected 88 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Auto-detected process type: SECONDARY
[New Thread 0x7ffff6eda700 (LWP 90103)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_90099_2f1b553882b62
[New Thread 0x7ffff66d9700 (LWP 90104)]

Thread 1 "symmetric_mp" received signal SIGSEGV, Segmentation fault.
0x00000000005566b5 in rte_fbarray_find_next_used ()
(gdb) bt
#0  0x00000000005566b5 in rte_fbarray_find_next_used ()
#1  0x000000000054da9c in rte_eal_check_dma_mask ()
#2  0x0000000000572ae7 in pci_one_device_iommu_support_va ()
#3  0x0000000000573988 in rte_pci_get_iommu_class ()
#4  0x000000000054f743 in rte_bus_get_iommu_class ()
#5  0x000000000053c123 in rte_eal_init ()
#6  0x000000000046be2b in main ()

Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Tuesday, October 30, 2018 5:41 PM
To: Lin, Xueqin <xueqin.lin@intel.com>
Cc: Yao, Lei A <lei.a.yao@intel.com>; Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>> wrote:
Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.

Hi,

I think it is not specifically about multiprocess but about hotplug with multiprocess because I can execute the symmetric_mp successfully with a secondary process.

Working on this as a priority.

Thanks.

Steps:

1.       Setup primary process successfully

./hotplug_mp --proc-type=auto



2.       Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264             for (idx = first; idx < msk->n_masks; idx++) {

#0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

    used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

    arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

    at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

    at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6  0x00000000005b9738 in pci_devices_iommu_support_va ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7  0x00000000005b97a7 in rte_pci_get_iommu_class ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8  0x000000000058f1ce in rte_bus_get_iommu_class ()

    at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>> wrote:


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 10:18                 ` Burakov, Anatoly
@ 2018-10-30 10:23                   ` Alejandro Lucero
  0 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 10:23 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Thomas Monjalon, lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Ferruh Yigit

On Tue, Oct 30, 2018 at 10:19 AM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 29-Oct-18 11:39 AM, Alejandro Lucero wrote:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the deadlock.
> >
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but
> > if you modify the call like this:
> >
> > *diff --git a/lib/librte_eal/common/eal_common_memory.c
> > b/lib/librte_eal/common/eal_common_memory.c*
> >
> > *index 12dcedf5c..69b26e464 100644*
> >
> > *--- a/lib/librte_eal/common/eal_common_memory.c*
> >
> > *+++ b/lib/librte_eal/common/eal_common_memory.c*
> >
> > @@ -462,7 +462,7 @@rte_eal_check_dma_mask(uint8_t maskbits)
> >
> > /* create dma mask */
> >
> > mask = ~((1ULL << maskbits) - 1);
> >
> > - if (rte_memseg_walk(check_iova, &mask))
> >
> > +if (!rte_memseg_walk(check_iova, &mask))
> >
> > /*
> >
> > * Dma mask precludes hugepage usage.
> >
> > * This device can not be used and we do not need to keep
> >
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
> >
> >
> > Anatoly, maybe you can see something I can not.
> >
>
> memseg walk will return 0 only when each callback returned 0 and there
> were no more segments left to call callbacks on. If your code always
> returns 0, then return value of memseg_walk will always be zero.
>
> If your code returns 1 or -1 in some cases, then this error condition
> will trigger. If it doesn't, then your condition by which you decide to
> return 1 or 0, is incorrect :) I couldn't spot any obvious issues there,
> but i'll recheck.
>
>
Thanks for looking at this, but I was wrong. The return code changes
everything so it does not make sense to compare both.

-- 
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30 10:11                           ` Burakov, Anatoly
@ 2018-10-30 10:19                             ` Alejandro Lucero
  0 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30 10:19 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Thomas Monjalon, lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Ferruh Yigit

On Tue, Oct 30, 2018 at 10:11 AM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 29-Oct-18 2:18 PM, Thomas Monjalon wrote:
> > 29/10/2018 14:40, Alejandro Lucero:
> >> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
> >>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> >>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> >>> wrote:
> >>>
> >>> 29/10/2018 12:39, Alejandro Lucero:
> >>>> I got a patch that solves a bug when calling rte_eal_dma_mask using
> the
> >>>> mask instead of the maskbits. However, this does not solves the
> >>> deadlock.
> >>>
> >>> The deadlock is a bigger concern I think.
> >>>
> >>> I think once the call to rte_eal_check_dma_mask uses the maskbits
> instead
> >>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
> >>>
> >>> Yao, can you try with the attached patch?
> >>>
> >>> Hi, Lucero
> >>>
> >>> This patch can fix the issue at my side. Thanks a lot
> >>> for you quick action.
> >>
> >> Great!
> >>
> >> I will send an official patch with the changes.
> >
> > Please, do not forget my other request to better comment functions.
> >
> >
> >> I have to say that I tested the patchset, but I think it was where
> >> legacy_mem was still there and therefore dynamic memory allocation code
> not
> >> used during memory initialization.
> >>
> >> There is something that concerns me though. Using
> >> rte_memseg_walk_thread_unsafe could be a problem under some situations
> >> although those situations being unlikely.
> >>
> >> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then
> >> it is safe to use the unsafe function for walking memsegs, but with
> device
> >> hotplug and dynamic memory allocation, there exists a potential race
> >> condition when the primary process is allocating more memory and
> >> concurrently a device is hotplugged and a secondary process does the
> device
> >> initialization. By now, this is just a problem with the NFP, and the
> >> potential race condition window really unlikely, but I will work on this
> >> asap.
> >
> > Yes, this is what concerns me.
> > You can add a comment explaining the unsafe which is not handled.
>
> The issue here is that this code is called from both memory-locked and
> memory-unlocked context. Virtio had a similar issue with their mem table
> update code - they solved it by manually locking the memory before doing
> everything else, and using thread_unsafe version of the walk.
>
> Could something like that be done here?
>
>
I have a patch adding a safe and an unsafe dma mask check versions.
However, because the multiprocess problem reported, I think the fixing
requires other type of work.

The problem I see now is calling rte_eal_check_dma_mask from set_iova_mode
code path is wrong. This can not be done at that point because the memory
has not been initialized yet.



> >
> >
> >>>> Interestingly, the problem looks like a compiler one. Calling
> >>>> rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> >>> but if
> >>>> you modify the call like this:
> >>>>
> >>>> -       if (rte_memseg_walk(check_iova, &mask))
> >>>> +       if (!rte_memseg_walk(check_iova, &mask))
> >>>>
> >>>> it works, although the value returned to the invoker changes, of
> course.
> >>>> But the point here is it should be the same behaviour when calling
> >>>> rte_memseg_walk than before and it is not.
> >>>
> >>> Anyway, the coding style requires to save the return value in a
> variable,
> >>> instead of nesting the call in an "if" condition.
> >>> And the "if" check should be explicitly != 0 because it is not a real
> >>> boolean.
> >>>
> >>> PS: please do not top post and avoid HTML emails, thanks
> >>>
> >>>
> >>
> >
> >
> >
> >
> >
> >
>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 11:39               ` Alejandro Lucero
  2018-10-29 11:46                 ` Thomas Monjalon
@ 2018-10-30 10:18                 ` Burakov, Anatoly
  2018-10-30 10:23                   ` Alejandro Lucero
  1 sibling, 1 reply; 62+ messages in thread
From: Burakov, Anatoly @ 2018-10-30 10:18 UTC (permalink / raw)
  To: Alejandro Lucero, Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Ferruh Yigit

On 29-Oct-18 11:39 AM, Alejandro Lucero wrote:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the 
> mask instead of the maskbits. However, this does not solves the deadlock.
> 
> Interestingly, the problem looks like a compiler one. Calling 
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but 
> if you modify the call like this:
> 
> *diff --git a/lib/librte_eal/common/eal_common_memory.c 
> b/lib/librte_eal/common/eal_common_memory.c*
> 
> *index 12dcedf5c..69b26e464 100644*
> 
> *--- a/lib/librte_eal/common/eal_common_memory.c*
> 
> *+++ b/lib/librte_eal/common/eal_common_memory.c*
> 
> @@ -462,7 +462,7 @@rte_eal_check_dma_mask(uint8_t maskbits)
> 
> /* create dma mask */
> 
> mask = ~((1ULL << maskbits) - 1);
> 
> - if (rte_memseg_walk(check_iova, &mask))
> 
> +if (!rte_memseg_walk(check_iova, &mask))
> 
> /*
> 
> * Dma mask precludes hugepage usage.
> 
> * This device can not be used and we do not need to keep
> 
> 
> it works, although the value returned to the invoker changes, of course. 
> But the point here is it should be the same behaviour when calling 
> rte_memseg_walk than before and it is not.
> 
> 
> Anatoly, maybe you can see something I can not.
> 

memseg walk will return 0 only when each callback returned 0 and there 
were no more segments left to call callbacks on. If your code always 
returns 0, then return value of memseg_walk will always be zero.

If your code returns 1 or -1 in some cases, then this error condition 
will trigger. If it doesn't, then your condition by which you decide to 
return 1 or 0, is incorrect :) I couldn't spot any obvious issues there, 
but i'll recheck.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 14:18                         ` Thomas Monjalon
  2018-10-29 14:35                           ` Alejandro Lucero
  2018-10-29 18:54                           ` Yongseok Koh
@ 2018-10-30 10:11                           ` Burakov, Anatoly
  2018-10-30 10:19                             ` Alejandro Lucero
  2 siblings, 1 reply; 62+ messages in thread
From: Burakov, Anatoly @ 2018-10-30 10:11 UTC (permalink / raw)
  To: Thomas Monjalon, Alejandro Lucero
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Ferruh Yigit

On 29-Oct-18 2:18 PM, Thomas Monjalon wrote:
> 29/10/2018 14:40, Alejandro Lucero:
>> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
>>> wrote:
>>>
>>> 29/10/2018 12:39, Alejandro Lucero:
>>>> I got a patch that solves a bug when calling rte_eal_dma_mask using the
>>>> mask instead of the maskbits. However, this does not solves the
>>> deadlock.
>>>
>>> The deadlock is a bigger concern I think.
>>>
>>> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
>>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>>>
>>> Yao, can you try with the attached patch?
>>>
>>> Hi, Lucero
>>>
>>> This patch can fix the issue at my side. Thanks a lot
>>> for you quick action.
>>
>> Great!
>>
>> I will send an official patch with the changes.
> 
> Please, do not forget my other request to better comment functions.
> 
> 
>> I have to say that I tested the patchset, but I think it was where
>> legacy_mem was still there and therefore dynamic memory allocation code not
>> used during memory initialization.
>>
>> There is something that concerns me though. Using
>> rte_memseg_walk_thread_unsafe could be a problem under some situations
>> although those situations being unlikely.
>>
>> Usually, calling rte_eal_check_dma_mask happens during initialization. Then
>> it is safe to use the unsafe function for walking memsegs, but with device
>> hotplug and dynamic memory allocation, there exists a potential race
>> condition when the primary process is allocating more memory and
>> concurrently a device is hotplugged and a secondary process does the device
>> initialization. By now, this is just a problem with the NFP, and the
>> potential race condition window really unlikely, but I will work on this
>> asap.
> 
> Yes, this is what concerns me.
> You can add a comment explaining the unsafe which is not handled.

The issue here is that this code is called from both memory-locked and 
memory-unlocked context. Virtio had a similar issue with their mem table 
update code - they solved it by manually locking the memory before doing 
everything else, and using thread_unsafe version of the walk.

Could something like that be done here?

> 
> 
>>>> Interestingly, the problem looks like a compiler one. Calling
>>>> rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
>>> but if
>>>> you modify the call like this:
>>>>
>>>> -       if (rte_memseg_walk(check_iova, &mask))
>>>> +       if (!rte_memseg_walk(check_iova, &mask))
>>>>
>>>> it works, although the value returned to the invoker changes, of course.
>>>> But the point here is it should be the same behaviour when calling
>>>> rte_memseg_walk than before and it is not.
>>>
>>> Anyway, the coding style requires to save the return value in a variable,
>>> instead of nesting the call in an "if" condition.
>>> And the "if" check should be explicitly != 0 because it is not a real
>>> boolean.
>>>
>>> PS: please do not top post and avoid HTML emails, thanks
>>>
>>>
>>
> 
> 
> 
> 
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 19:37                             ` Alejandro Lucero
@ 2018-10-30 10:10                               ` Burakov, Anatoly
  0 siblings, 0 replies; 62+ messages in thread
From: Burakov, Anatoly @ 2018-10-30 10:10 UTC (permalink / raw)
  To: Alejandro Lucero, Yongseok Koh
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, xueqin.lin, Ferruh Yigit

On 29-Oct-18 7:37 PM, Alejandro Lucero wrote:
> 
> 
> On Mon, Oct 29, 2018 at 6:54 PM Yongseok Koh <yskoh@mellanox.com 
> <mailto:yskoh@mellanox.com>> wrote:
> 
> 
>      > On Oct 29, 2018, at 7:18 AM, Thomas Monjalon <thomas@monjalon.net
>     <mailto:thomas@monjalon.net>> wrote:
>      >
>      > 29/10/2018 14:40, Alejandro Lucero:
>      >> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com
>     <mailto:lei.a.yao@intel.com>> wrote:
>      >>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com
>     <mailto:alejandro.lucero@netronome.com>]
>      >>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon
>     <thomas@monjalon.net <mailto:thomas@monjalon.net>>
>      >>> wrote:
>      >>>
>      >>> 29/10/2018 12:39, Alejandro Lucero:
>      >>>> I got a patch that solves a bug when calling rte_eal_dma_mask
>     using the
>      >>>> mask instead of the maskbits. However, this does not solves the
>      >>> deadlock.
>      >>>
>      >>> The deadlock is a bigger concern I think.
>      >>>
>      >>> I think once the call to rte_eal_check_dma_mask uses the
>     maskbits instead
>      >>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the
>     deadlock.
>      >>>
>      >>> Yao, can you try with the attached patch?
>      >>>
>      >>> Hi, Lucero
>      >>>
>      >>> This patch can fix the issue at my side. Thanks a lot
>      >>> for you quick action.
>      >>
>      >> Great!
>      >>
>      >> I will send an official patch with the changes.
>      >
>      > Please, do not forget my other request to better comment functions.
> 
>     Alejandro,
> 
>     This patchset has been merged to stable/17.11 per your request for
>     the last release.
>     You must send a fix to stable/17.11 as well, if you think there's a
>     same issue there.
> 
> 
> The patchset for 17.11 was much more simpler. There have been a lot of 
> changes to the memory code since 17.11, and this problem should not be 
> present in stable 17.11.
> 
> Once I have said that, if there are any reports about a problem with 
> this patchset in 17.11, I will work on it as a priority.
> 
> Thanks.
> 

17.11 will definitely be immune to the deadlock issue since there are no 
memseg walks there :) It however may still have incorrect mask handling.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-30  3:20                         ` Lin, Xueqin
@ 2018-10-30  9:41                           ` Alejandro Lucero
  2018-10-30 10:33                             ` Lin, Xueqin
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-30  9:41 UTC (permalink / raw)
  To: xueqin.lin
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, Burakov, Anatoly,
	Ferruh Yigit, Qi Zhang

On Tue, Oct 30, 2018 at 3:20 AM Lin, Xueqin <xueqin.lin@intel.com> wrote:

> Hi Lucero&Thomas,
>
>
>
> Find the patch can’t fix multi-process cases.
>

Hi,

I think it is not specifically about multiprocess but about hotplug with
multiprocess because I can execute the symmetric_mp successfully with a
secondary process.

Working on this as a priority.

Thanks.


> Steps:
>
> 1.       Setup primary process successfully
>
> ./hotplug_mp --proc-type=auto
>
>
>
> 2.       Fail to setup secondary process
>
> ./hotplug_mp --proc-type=auto
>
> EAL: Detected 88 lcore(s)
>
> EAL: Detected 2 NUMA nodes
>
> EAL: Auto-detected process type: SECONDARY
>
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23
>
> Segmentation fault (core dumped)
>
>
>
> More information as below:
>
> Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.
>
> 0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> 264             for (idx = first; idx < msk->n_masks; idx++) {
>
> #0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0,
> used=true)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264
>
> #1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0,
> next=true,
>
>     used=true) at
> /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001
>
> #2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4,
> start=0)
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018
>
> #3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401
> <check_iova>,
>
>     arg=0x7fffffffcc38) at
> /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589
>
> #4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465
>
> #5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:593
>
> #6  0x00000000005b9738 in pci_devices_iommu_support_va ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:626
>
> #7  0x00000000005b97a7 in rte_pci_get_iommu_class ()
>
>     at /root/dpdk/drivers/bus/pci/linux/pci.c:650
>
> #8  0x000000000058f1ce in rte_bus_get_iommu_class ()
>
>     at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237
>
> #9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919
>
> #10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)
>
>     at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28
>
>
>
>
>
> Best regards,
>
> Xueqin
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 9:41 PM
> *To:* Yao, Lei A <lei.a.yao@intel.com>
> *Cc:* Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian
> Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>
>
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 8:56 PM
> *To:* Thomas Monjalon <thomas@monjalon.net>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the
> deadlock.
>
> The deadlock is a bigger concern I think.
>
>
>
> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>
>
>
> Yao, can you try with the attached patch?
>
>
>
> Hi, Lucero
>
>
>
> This patch can fix the issue at my side. Thanks a lot
>
> for you quick action.
>
>
>
>
>
> Great!
>
>
>
> I will send an official patch with the changes.
>
>
>
> I have to say that I tested the patchset, but I think it was where
> legacy_mem was still there and therefore dynamic memory allocation code not
> used during memory initialization.
>
>
>
> There is something that concerns me though. Using
> rte_memseg_walk_thread_unsafe could be a problem under some situations
> although those situations being unlikely.
>
>
>
> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then it is safe to use the unsafe function for walking memsegs, but with
> device hotplug and dynamic memory allocation, there exists a potential race
> condition when the primary process is allocating more memory and
> concurrently a device is hotplugged and a secondary process does the device
> initialization. By now, this is just a problem with the NFP, and the
> potential race condition window really unlikely, but I will work on this
> asap.
>
>
>
> BRs
>
> Lei
>
>
>
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> but if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 13:40                       ` Alejandro Lucero
  2018-10-29 14:18                         ` Thomas Monjalon
@ 2018-10-30  3:20                         ` Lin, Xueqin
  2018-10-30  9:41                           ` Alejandro Lucero
  1 sibling, 1 reply; 62+ messages in thread
From: Lin, Xueqin @ 2018-10-30  3:20 UTC (permalink / raw)
  To: Alejandro Lucero, Yao, Lei A, Thomas Monjalon
  Cc: dev, Xu, Qian Q, Burakov, Anatoly, Yigit, Ferruh, Zhang, Qi Z

Hi Lucero&Thomas,

Find the patch can’t fix multi-process cases.
Steps:

1.       Setup primary process successfully

./hotplug_mp --proc-type=auto



2.       Fail to setup secondary process

./hotplug_mp --proc-type=auto

EAL: Detected 88 lcore(s)

EAL: Detected 2 NUMA nodes

EAL: Auto-detected process type: SECONDARY

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_147212_2bfe08ee88d23

Segmentation fault (core dumped)


More information as below:

Thread 1 "hotplug_mp" received signal SIGSEGV, Segmentation fault.

0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

264             for (idx = first; idx < msk->n_masks; idx++) {

#0  0x0000000000597cfb in find_next (arr=0x7ffff7ff20a4, start=0, used=true)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:264

#1  0x0000000000598573 in fbarray_find (arr=0x7ffff7ff20a4, start=0, next=true,

    used=true) at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1001

#2  0x000000000059929b in rte_fbarray_find_next_used (arr=0x7ffff7ff20a4, start=0)

    at /root/dpdk/lib/librte_eal/common/eal_common_fbarray.c:1018

#3  0x000000000058c877 in rte_memseg_walk_thread_unsafe (func=0x58c401 <check_iova>,

    arg=0x7fffffffcc38) at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:589

#4  0x000000000058ce08 in rte_eal_check_dma_mask (maskbits=48 '0')

    at /root/dpdk/lib/librte_eal/common/eal_common_memory.c:465

#5  0x00000000005b96c4 in pci_one_device_iommu_support_va (dev=0x11b3d90)

    at /root/dpdk/drivers/bus/pci/linux/pci.c:593

#6  0x00000000005b9738 in pci_devices_iommu_support_va ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:626

#7  0x00000000005b97a7 in rte_pci_get_iommu_class ()

    at /root/dpdk/drivers/bus/pci/linux/pci.c:650

#8  0x000000000058f1ce in rte_bus_get_iommu_class ()

    at /root/dpdk/lib/librte_eal/common/eal_common_bus.c:237

#9  0x0000000000577c7a in rte_eal_init (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/lib/librte_eal/linuxapp/eal/eal.c:919

#10 0x000000000045dd56 in main (argc=2, argv=0x7fffffffdf98)

    at /root/dpdk/examples/multi_process/hotplug_mp/main.c:28


Best regards,
Xueqin

From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Monday, October 29, 2018 9:41 PM
To: Yao, Lei A <lei.a.yao@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>> wrote:


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Cc: Yao, Lei A <lei.a.yao@intel.com<mailto:lei.a.yao@intel.com>>; dev <dev@dpdk.org<mailto:dev@dpdk.org>>; Xu, Qian Q <qian.q.xu@intel.com<mailto:qian.q.xu@intel.com>>; Lin, Xueqin <xueqin.lin@intel.com<mailto:xueqin.lin@intel.com>>; Burakov, Anatoly <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Yigit, Ferruh <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.


Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where legacy_mem was still there and therefore dynamic memory allocation code not used during memory initialization.

There is something that concerns me though. Using rte_memseg_walk_thread_unsafe could be a problem under some situations although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then it is safe to use the unsafe function for walking memsegs, but with device hotplug and dynamic memory allocation, there exists a potential race condition when the primary process is allocating more memory and concurrently a device is hotplugged and a secondary process does the device initialization. By now, this is just a problem with the NFP, and the potential race condition window really unlikely, but I will work on this asap.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 18:54                           ` Yongseok Koh
@ 2018-10-29 19:37                             ` Alejandro Lucero
  2018-10-30 10:10                               ` Burakov, Anatoly
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 19:37 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, xueqin.lin, Burakov,
	Anatoly, Ferruh Yigit

On Mon, Oct 29, 2018 at 6:54 PM Yongseok Koh <yskoh@mellanox.com> wrote:

>
> > On Oct 29, 2018, at 7:18 AM, Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 29/10/2018 14:40, Alejandro Lucero:
> >> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
> >>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> >>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> >>> wrote:
> >>>
> >>> 29/10/2018 12:39, Alejandro Lucero:
> >>>> I got a patch that solves a bug when calling rte_eal_dma_mask using
> the
> >>>> mask instead of the maskbits. However, this does not solves the
> >>> deadlock.
> >>>
> >>> The deadlock is a bigger concern I think.
> >>>
> >>> I think once the call to rte_eal_check_dma_mask uses the maskbits
> instead
> >>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
> >>>
> >>> Yao, can you try with the attached patch?
> >>>
> >>> Hi, Lucero
> >>>
> >>> This patch can fix the issue at my side. Thanks a lot
> >>> for you quick action.
> >>
> >> Great!
> >>
> >> I will send an official patch with the changes.
> >
> > Please, do not forget my other request to better comment functions.
>
> Alejandro,
>
> This patchset has been merged to stable/17.11 per your request for the
> last release.
> You must send a fix to stable/17.11 as well, if you think there's a same
> issue there.
>
>
The patchset for 17.11 was much more simpler. There have been a lot of
changes to the memory code since 17.11, and this problem should not be
present in stable 17.11.

Once I have said that, if there are any reports about a problem with this
patchset in 17.11, I will work on it as a priority.

Thanks.


> Thanks,
> Yongseok
>
> >> I have to say that I tested the patchset, but I think it was where
> >> legacy_mem was still there and therefore dynamic memory allocation code
> not
> >> used during memory initialization.
> >>
> >> There is something that concerns me though. Using
> >> rte_memseg_walk_thread_unsafe could be a problem under some situations
> >> although those situations being unlikely.
> >>
> >> Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then
> >> it is safe to use the unsafe function for walking memsegs, but with
> device
> >> hotplug and dynamic memory allocation, there exists a potential race
> >> condition when the primary process is allocating more memory and
> >> concurrently a device is hotplugged and a secondary process does the
> device
> >> initialization. By now, this is just a problem with the NFP, and the
> >> potential race condition window really unlikely, but I will work on this
> >> asap.
> >
> > Yes, this is what concerns me.
> > You can add a comment explaining the unsafe which is not handled.
> >
> >
> >>>> Interestingly, the problem looks like a compiler one. Calling
> >>>> rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> >>> but if
> >>>> you modify the call like this:
> >>>>
> >>>> -       if (rte_memseg_walk(check_iova, &mask))
> >>>> +       if (!rte_memseg_walk(check_iova, &mask))
> >>>>
> >>>> it works, although the value returned to the invoker changes, of
> course.
> >>>> But the point here is it should be the same behaviour when calling
> >>>> rte_memseg_walk than before and it is not.
> >>>
> >>> Anyway, the coding style requires to save the return value in a
> variable,
> >>> instead of nesting the call in an "if" condition.
> >>> And the "if" check should be explicitly != 0 because it is not a real
> >>> boolean.
> >>>
> >>> PS: please do not top post and avoid HTML emails, thanks
> >>>
> >>>
> >>
> >
> >
> >
> >
> >
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 14:18                         ` Thomas Monjalon
  2018-10-29 14:35                           ` Alejandro Lucero
@ 2018-10-29 18:54                           ` Yongseok Koh
  2018-10-29 19:37                             ` Alejandro Lucero
  2018-10-30 10:11                           ` Burakov, Anatoly
  2 siblings, 1 reply; 62+ messages in thread
From: Yongseok Koh @ 2018-10-29 18:54 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: lei.a.yao, Thomas Monjalon, dev, Xu, Qian Q, xueqin.lin, Burakov,
	Anatoly, Ferruh Yigit


> On Oct 29, 2018, at 7:18 AM, Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> 29/10/2018 14:40, Alejandro Lucero:
>> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
>>> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>>> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
>>> wrote:
>>> 
>>> 29/10/2018 12:39, Alejandro Lucero:
>>>> I got a patch that solves a bug when calling rte_eal_dma_mask using the
>>>> mask instead of the maskbits. However, this does not solves the
>>> deadlock.
>>> 
>>> The deadlock is a bigger concern I think.
>>> 
>>> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
>>> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>>> 
>>> Yao, can you try with the attached patch?
>>> 
>>> Hi, Lucero
>>> 
>>> This patch can fix the issue at my side. Thanks a lot
>>> for you quick action.
>> 
>> Great!
>> 
>> I will send an official patch with the changes.
> 
> Please, do not forget my other request to better comment functions.

Alejandro,

This patchset has been merged to stable/17.11 per your request for the last release.
You must send a fix to stable/17.11 as well, if you think there's a same issue there.

Thanks,
Yongseok

>> I have to say that I tested the patchset, but I think it was where
>> legacy_mem was still there and therefore dynamic memory allocation code not
>> used during memory initialization.
>> 
>> There is something that concerns me though. Using
>> rte_memseg_walk_thread_unsafe could be a problem under some situations
>> although those situations being unlikely.
>> 
>> Usually, calling rte_eal_check_dma_mask happens during initialization. Then
>> it is safe to use the unsafe function for walking memsegs, but with device
>> hotplug and dynamic memory allocation, there exists a potential race
>> condition when the primary process is allocating more memory and
>> concurrently a device is hotplugged and a secondary process does the device
>> initialization. By now, this is just a problem with the NFP, and the
>> potential race condition window really unlikely, but I will work on this
>> asap.
> 
> Yes, this is what concerns me.
> You can add a comment explaining the unsafe which is not handled.
> 
> 
>>>> Interestingly, the problem looks like a compiler one. Calling
>>>> rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
>>> but if
>>>> you modify the call like this:
>>>> 
>>>> -       if (rte_memseg_walk(check_iova, &mask))
>>>> +       if (!rte_memseg_walk(check_iova, &mask))
>>>> 
>>>> it works, although the value returned to the invoker changes, of course.
>>>> But the point here is it should be the same behaviour when calling
>>>> rte_memseg_walk than before and it is not.
>>> 
>>> Anyway, the coding style requires to save the return value in a variable,
>>> instead of nesting the call in an "if" condition.
>>> And the "if" check should be explicitly != 0 because it is not a real
>>> boolean.
>>> 
>>> PS: please do not top post and avoid HTML emails, thanks
>>> 
>>> 
>> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 14:18                         ` Thomas Monjalon
@ 2018-10-29 14:35                           ` Alejandro Lucero
  2018-10-29 18:54                           ` Yongseok Koh
  2018-10-30 10:11                           ` Burakov, Anatoly
  2 siblings, 0 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 14:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

On Mon, Oct 29, 2018 at 2:18 PM Thomas Monjalon <thomas@monjalon.net> wrote:

> 29/10/2018 14:40, Alejandro Lucero:
> > On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
> > > *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> > > On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> > > wrote:
> > >
> > > 29/10/2018 12:39, Alejandro Lucero:
> > > > I got a patch that solves a bug when calling rte_eal_dma_mask using
> the
> > > > mask instead of the maskbits. However, this does not solves the
> > > deadlock.
> > >
> > > The deadlock is a bigger concern I think.
> > >
> > > I think once the call to rte_eal_check_dma_mask uses the maskbits
> instead
> > > of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
> > >
> > > Yao, can you try with the attached patch?
> > >
> > > Hi, Lucero
> > >
> > > This patch can fix the issue at my side. Thanks a lot
> > > for you quick action.
> >
> > Great!
> >
> > I will send an official patch with the changes.
>
> Please, do not forget my other request to better comment functions.
>
>
>
Sure.


> > I have to say that I tested the patchset, but I think it was where
> > legacy_mem was still there and therefore dynamic memory allocation code
> not
> > used during memory initialization.
> >
> > There is something that concerns me though. Using
> > rte_memseg_walk_thread_unsafe could be a problem under some situations
> > although those situations being unlikely.
> >
> > Usually, calling rte_eal_check_dma_mask happens during initialization.
> Then
> > it is safe to use the unsafe function for walking memsegs, but with
> device
> > hotplug and dynamic memory allocation, there exists a potential race
> > condition when the primary process is allocating more memory and
> > concurrently a device is hotplugged and a secondary process does the
> device
> > initialization. By now, this is just a problem with the NFP, and the
> > potential race condition window really unlikely, but I will work on this
> > asap.
>
> Yes, this is what concerns me.
> You can add a comment explaining the unsafe which is not handled.
>
>
I'' do.

Thanks!


>
> > > > Interestingly, the problem looks like a compiler one. Calling
> > > > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> > > but if
> > > > you modify the call like this:
> > > >
> > > > -       if (rte_memseg_walk(check_iova, &mask))
> > > > +       if (!rte_memseg_walk(check_iova, &mask))
> > > >
> > > > it works, although the value returned to the invoker changes, of
> course.
> > > > But the point here is it should be the same behaviour when calling
> > > > rte_memseg_walk than before and it is not.
> > >
> > > Anyway, the coding style requires to save the return value in a
> variable,
> > > instead of nesting the call in an "if" condition.
> > > And the "if" check should be explicitly != 0 because it is not a real
> > > boolean.
> > >
> > > PS: please do not top post and avoid HTML emails, thanks
> > >
> > >
> >
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 13:40                       ` Alejandro Lucero
@ 2018-10-29 14:18                         ` Thomas Monjalon
  2018-10-29 14:35                           ` Alejandro Lucero
                                             ` (2 more replies)
  2018-10-30  3:20                         ` Lin, Xueqin
  1 sibling, 3 replies; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-29 14:18 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

29/10/2018 14:40, Alejandro Lucero:
> On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:
> > *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> > On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> > wrote:
> >
> > 29/10/2018 12:39, Alejandro Lucero:
> > > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > > mask instead of the maskbits. However, this does not solves the
> > deadlock.
> >
> > The deadlock is a bigger concern I think.
> >
> > I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> > of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
> >
> > Yao, can you try with the attached patch?
> >
> > Hi, Lucero
> >
> > This patch can fix the issue at my side. Thanks a lot
> > for you quick action.
> 
> Great!
> 
> I will send an official patch with the changes.

Please, do not forget my other request to better comment functions.


> I have to say that I tested the patchset, but I think it was where
> legacy_mem was still there and therefore dynamic memory allocation code not
> used during memory initialization.
> 
> There is something that concerns me though. Using
> rte_memseg_walk_thread_unsafe could be a problem under some situations
> although those situations being unlikely.
> 
> Usually, calling rte_eal_check_dma_mask happens during initialization. Then
> it is safe to use the unsafe function for walking memsegs, but with device
> hotplug and dynamic memory allocation, there exists a potential race
> condition when the primary process is allocating more memory and
> concurrently a device is hotplugged and a secondary process does the device
> initialization. By now, this is just a problem with the NFP, and the
> potential race condition window really unlikely, but I will work on this
> asap.

Yes, this is what concerns me.
You can add a comment explaining the unsafe which is not handled.


> > > Interestingly, the problem looks like a compiler one. Calling
> > > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> > but if
> > > you modify the call like this:
> > >
> > > -       if (rte_memseg_walk(check_iova, &mask))
> > > +       if (!rte_memseg_walk(check_iova, &mask))
> > >
> > > it works, although the value returned to the invoker changes, of course.
> > > But the point here is it should be the same behaviour when calling
> > > rte_memseg_walk than before and it is not.
> >
> > Anyway, the coding style requires to save the return value in a variable,
> > instead of nesting the call in an "if" condition.
> > And the "if" check should be explicitly != 0 because it is not a real
> > boolean.
> >
> > PS: please do not top post and avoid HTML emails, thanks
> >
> >
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 13:18                     ` Yao, Lei A
@ 2018-10-29 13:40                       ` Alejandro Lucero
  2018-10-29 14:18                         ` Thomas Monjalon
  2018-10-30  3:20                         ` Lin, Xueqin
  0 siblings, 2 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 13:40 UTC (permalink / raw)
  To: lei.a.yao
  Cc: Thomas Monjalon, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly,
	Ferruh Yigit

On Mon, Oct 29, 2018 at 1:18 PM Yao, Lei A <lei.a.yao@intel.com> wrote:

>
>
>
>
> *From:* Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> *Sent:* Monday, October 29, 2018 8:56 PM
> *To:* Thomas Monjalon <thomas@monjalon.net>
> *Cc:* Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <
> qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov,
> Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com
> >
> *Subject:* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
>
>
>
>
>
> On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the
> deadlock.
>
> The deadlock is a bigger concern I think.
>
>
>
> I think once the call to rte_eal_check_dma_mask uses the maskbits instead
> of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.
>
>
>
> Yao, can you try with the attached patch?
>
>
>
> Hi, Lucero
>
>
>
> This patch can fix the issue at my side. Thanks a lot
>
> for you quick action.
>
>
>

Great!

I will send an official patch with the changes.

I have to say that I tested the patchset, but I think it was where
legacy_mem was still there and therefore dynamic memory allocation code not
used during memory initialization.

There is something that concerns me though. Using
rte_memseg_walk_thread_unsafe could be a problem under some situations
although those situations being unlikely.

Usually, calling rte_eal_check_dma_mask happens during initialization. Then
it is safe to use the unsafe function for walking memsegs, but with device
hotplug and dynamic memory allocation, there exists a potential race
condition when the primary process is allocating more memory and
concurrently a device is hotplugged and a secondary process does the device
initialization. By now, this is just a problem with the NFP, and the
potential race condition window really unlikely, but I will work on this
asap.


> BRs
>
> Lei
>
>
>
> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask,
> but if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 12:55                   ` Alejandro Lucero
@ 2018-10-29 13:18                     ` Yao, Lei A
  2018-10-29 13:40                       ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Yao, Lei A @ 2018-10-29 13:18 UTC (permalink / raw)
  To: Alejandro Lucero, Thomas Monjalon
  Cc: dev, Xu, Qian Q, Lin, Xueqin, Burakov, Anatoly, Yigit, Ferruh



From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Monday, October 29, 2018 8:56 PM
To: Thomas Monjalon <thomas@monjalon.net>
Cc: Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask


On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?

Hi, Lucero

This patch can fix the issue at my side. Thanks a lot
for you quick action.

BRs
Lei

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
>
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
>
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 11:46                 ` Thomas Monjalon
@ 2018-10-29 12:55                   ` Alejandro Lucero
  2018-10-29 13:18                     ` Yao, Lei A
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 12:55 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon <thomas@monjalon.net>
wrote:

> 29/10/2018 12:39, Alejandro Lucero:
> > I got a patch that solves a bug when calling rte_eal_dma_mask using the
> > mask instead of the maskbits. However, this does not solves the deadlock.
>
> The deadlock is a bigger concern I think.
>
>
I think once the call to rte_eal_check_dma_mask uses the maskbits instead
of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock.

Yao, can you try with the attached patch?


> > Interestingly, the problem looks like a compiler one. Calling
> > rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but
> if
> > you modify the call like this:
> >
> > -       if (rte_memseg_walk(check_iova, &mask))
> > +       if (!rte_memseg_walk(check_iova, &mask))
> >
> > it works, although the value returned to the invoker changes, of course.
> > But the point here is it should be the same behaviour when calling
> > rte_memseg_walk than before and it is not.
>
> Anyway, the coding style requires to save the return value in a variable,
> instead of nesting the call in an "if" condition.
> And the "if" check should be explicitly != 0 because it is not a real
> boolean.
>
> PS: please do not top post and avoid HTML emails, thanks
>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 11:39               ` Alejandro Lucero
@ 2018-10-29 11:46                 ` Thomas Monjalon
  2018-10-29 12:55                   ` Alejandro Lucero
  2018-10-30 10:18                 ` Burakov, Anatoly
  1 sibling, 1 reply; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-29 11:46 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

29/10/2018 12:39, Alejandro Lucero:
> I got a patch that solves a bug when calling rte_eal_dma_mask using the
> mask instead of the maskbits. However, this does not solves the deadlock.

The deadlock is a bigger concern I think.

> Interestingly, the problem looks like a compiler one. Calling
> rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
> you modify the call like this:
> 
> -       if (rte_memseg_walk(check_iova, &mask))
> +       if (!rte_memseg_walk(check_iova, &mask))
> 
> it works, although the value returned to the invoker changes, of course.
> But the point here is it should be the same behaviour when calling
> rte_memseg_walk than before and it is not.

Anyway, the coding style requires to save the return value in a variable,
instead of nesting the call in an "if" condition.
And the "if" check should be explicitly != 0 because it is not a real boolean.

PS: please do not top post and avoid HTML emails, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 10:15             ` Alejandro Lucero
@ 2018-10-29 11:39               ` Alejandro Lucero
  2018-10-29 11:46                 ` Thomas Monjalon
  2018-10-30 10:18                 ` Burakov, Anatoly
  0 siblings, 2 replies; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 11:39 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

I got a patch that solves a bug when calling rte_eal_dma_mask using the
mask instead of the maskbits. However, this does not solves the deadlock.

Interestingly, the problem looks like a compiler one. Calling
rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if
you modify the call like this:

*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*

*index 12dcedf5c..69b26e464 100644*

*--- a/lib/librte_eal/common/eal_common_memory.c*

*+++ b/lib/librte_eal/common/eal_common_memory.c*

@@ -462,7 +462,7 @@ rte_eal_check_dma_mask(uint8_t maskbits)

        /* create dma mask */

        mask = ~((1ULL << maskbits) - 1);



-       if (rte_memseg_walk(check_iova, &mask))

+       if (!rte_memseg_walk(check_iova, &mask))

                /*

                 * Dma mask precludes hugepage usage.

                 * This device can not be used and we do not need to keep


it works, although the value returned to the invoker changes, of course.
But the point here is it should be the same behaviour when calling
rte_memseg_walk than before and it is not.


Anatoly, maybe you can see something I can not.



On Mon, Oct 29, 2018 at 10:15 AM Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:

> Apologies. Forget my previous email. Just using the wrong repo.
>
> Looking at solving this asap.
>
> On Mon, Oct 29, 2018 at 10:11 AM Alejandro Lucero <
> alejandro.lucero@netronome.com> wrote:
>
>> I know what is going on.
>>
>> In patchset version 3 I forgot to remove an old code. Anatoly spotted
>> that and I was going to send another version for fixing it. Before sending
>> the new version I saw that report about a problem with dma_mask and I'm
>> afraid I did not send another version with the fix ...
>>
>> Yao, can you try with next patch?:
>>
>> *diff --git a/lib/librte_eal/common/eal_common_memory.c
>> b/lib/librte_eal/common/eal_common_memory.c*
>>
>> *index ef656bbad..26adf46c0 100644*
>>
>> *--- a/lib/librte_eal/common/eal_common_memory.c*
>>
>> *+++ b/lib/librte_eal/common/eal_common_memory.c*
>>
>> @@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)
>>
>>                 return -1;
>>
>>         }
>>
>>
>>
>> -       /* keep the more restricted maskbit */
>>
>> -       if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
>>
>> -               mcfg->dma_maskbits = maskbits;
>>
>> -
>>
>>         /* create dma mask */
>>
>>         mask = ~((1ULL << maskbits) - 1);
>>
>> On Mon, Oct 29, 2018 at 9:48 AM Thomas Monjalon <thomas@monjalon.net>
>> wrote:
>>
>>> 29/10/2018 10:36, Yao, Lei A:
>>> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
>>> > > 29/10/2018 09:23, Yao, Lei A:
>>> > > > Hi, Lucero, Thomas
>>> > > >
>>> > > > This patch set will cause deadlock during memory initialization.
>>> > > > rte_memseg_walk and try_expand_heap both will lock
>>> > > > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
>>> > > >
>>> > > > #0       rte_memseg_walk
>>> > > > #1  <-rte_eal_check_dma_mask
>>> > > > #2  <-alloc_pages_on_heap
>>> > > > #3  <-try_expand_heap_primary
>>> > > > #4  <-try_expand_heap
>>> > > >
>>> > > > Log as following:
>>> > > > EAL: TSC frequency is ~2494156 KHz
>>> > > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
>>> > > > [New Thread 0x7ffff5e0d700 (LWP 330350)]
>>> > > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
>>> > > > EAL: Trying to obtain current memory policy.
>>> > > > EAL: Setting policy MPOL_PREFERRED for socket 0
>>> > > > EAL: Restoring previous memory policy: 0
>>> > > >
>>> > > > Could you have a check on this? A lot of test cases in our
>>> validation
>>> > > > team fail because of this. Thanks a lot!
>>> > >
>>> > > Can we just call rte_memseg_walk_thread_unsafe()?
>>> > >
>>> > > +Cc Anatoly
>>> >
>>> > Hi, Thomas
>>> >
>>> > I change to rte_memseg_walk_thread_unsafe(), still
>>> > Can't work.
>>> >
>>> > EAL: Setting policy MPOL_PREFERRED for socket 0
>>> > EAL: Restoring previous memory policy: 0
>>> > EAL: memseg iova 140000000, len 40000000, out of range
>>> > EAL:    using dma mask ffffffffffffffff
>>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
>>> > EAL: Trying to obtain current memory policy.
>>> > EAL: Setting policy MPOL_PREFERRED for socket 1
>>> > EAL: Restoring previous memory policy: 0
>>> > EAL: memseg iova 1bc0000000, len 40000000, out of range
>>> > EAL:    using dma mask ffffffffffffffff
>>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
>>> > error allocating rte services array
>>> > EAL: FATAL: rte_service_init() failed
>>> > EAL: rte_service_init() failed
>>> > PANIC in main():
>>>
>>> I think it is showing there are at least 2 issues:
>>>         1/ deadlock
>>>         2/ allocation does not comply with mask check (out of range)
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29 10:11           ` Alejandro Lucero
@ 2018-10-29 10:15             ` Alejandro Lucero
  2018-10-29 11:39               ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 10:15 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

Apologies. Forget my previous email. Just using the wrong repo.

Looking at solving this asap.

On Mon, Oct 29, 2018 at 10:11 AM Alejandro Lucero <
alejandro.lucero@netronome.com> wrote:

> I know what is going on.
>
> In patchset version 3 I forgot to remove an old code. Anatoly spotted that
> and I was going to send another version for fixing it. Before sending the
> new version I saw that report about a problem with dma_mask and I'm afraid
> I did not send another version with the fix ...
>
> Yao, can you try with next patch?:
>
> *diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c*
>
> *index ef656bbad..26adf46c0 100644*
>
> *--- a/lib/librte_eal/common/eal_common_memory.c*
>
> *+++ b/lib/librte_eal/common/eal_common_memory.c*
>
> @@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)
>
>                 return -1;
>
>         }
>
>
>
> -       /* keep the more restricted maskbit */
>
> -       if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
>
> -               mcfg->dma_maskbits = maskbits;
>
> -
>
>         /* create dma mask */
>
>         mask = ~((1ULL << maskbits) - 1);
>
> On Mon, Oct 29, 2018 at 9:48 AM Thomas Monjalon <thomas@monjalon.net>
> wrote:
>
>> 29/10/2018 10:36, Yao, Lei A:
>> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
>> > > 29/10/2018 09:23, Yao, Lei A:
>> > > > Hi, Lucero, Thomas
>> > > >
>> > > > This patch set will cause deadlock during memory initialization.
>> > > > rte_memseg_walk and try_expand_heap both will lock
>> > > > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
>> > > >
>> > > > #0       rte_memseg_walk
>> > > > #1  <-rte_eal_check_dma_mask
>> > > > #2  <-alloc_pages_on_heap
>> > > > #3  <-try_expand_heap_primary
>> > > > #4  <-try_expand_heap
>> > > >
>> > > > Log as following:
>> > > > EAL: TSC frequency is ~2494156 KHz
>> > > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
>> > > > [New Thread 0x7ffff5e0d700 (LWP 330350)]
>> > > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
>> > > > EAL: Trying to obtain current memory policy.
>> > > > EAL: Setting policy MPOL_PREFERRED for socket 0
>> > > > EAL: Restoring previous memory policy: 0
>> > > >
>> > > > Could you have a check on this? A lot of test cases in our
>> validation
>> > > > team fail because of this. Thanks a lot!
>> > >
>> > > Can we just call rte_memseg_walk_thread_unsafe()?
>> > >
>> > > +Cc Anatoly
>> >
>> > Hi, Thomas
>> >
>> > I change to rte_memseg_walk_thread_unsafe(), still
>> > Can't work.
>> >
>> > EAL: Setting policy MPOL_PREFERRED for socket 0
>> > EAL: Restoring previous memory policy: 0
>> > EAL: memseg iova 140000000, len 40000000, out of range
>> > EAL:    using dma mask ffffffffffffffff
>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
>> > EAL: Trying to obtain current memory policy.
>> > EAL: Setting policy MPOL_PREFERRED for socket 1
>> > EAL: Restoring previous memory policy: 0
>> > EAL: memseg iova 1bc0000000, len 40000000, out of range
>> > EAL:    using dma mask ffffffffffffffff
>> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
>> > error allocating rte services array
>> > EAL: FATAL: rte_service_init() failed
>> > EAL: rte_service_init() failed
>> > PANIC in main():
>>
>> I think it is showing there are at least 2 issues:
>>         1/ deadlock
>>         2/ allocation does not comply with mask check (out of range)
>>
>>
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  9:48         ` Thomas Monjalon
@ 2018-10-29 10:11           ` Alejandro Lucero
  2018-10-29 10:15             ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29 10:11 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly, Ferruh Yigit

I know what is going on.

In patchset version 3 I forgot to remove an old code. Anatoly spotted that
and I was going to send another version for fixing it. Before sending the
new version I saw that report about a problem with dma_mask and I'm afraid
I did not send another version with the fix ...

Yao, can you try with next patch?:

*diff --git a/lib/librte_eal/common/eal_common_memory.c
b/lib/librte_eal/common/eal_common_memory.c*

*index ef656bbad..26adf46c0 100644*

*--- a/lib/librte_eal/common/eal_common_memory.c*

*+++ b/lib/librte_eal/common/eal_common_memory.c*

@@ -458,10 +458,6 @@ rte_eal_check_dma_mask(uint8_t maskbits)

                return -1;

        }



-       /* keep the more restricted maskbit */

-       if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)

-               mcfg->dma_maskbits = maskbits;

-

        /* create dma mask */

        mask = ~((1ULL << maskbits) - 1);

On Mon, Oct 29, 2018 at 9:48 AM Thomas Monjalon <thomas@monjalon.net> wrote:

> 29/10/2018 10:36, Yao, Lei A:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > 29/10/2018 09:23, Yao, Lei A:
> > > > Hi, Lucero, Thomas
> > > >
> > > > This patch set will cause deadlock during memory initialization.
> > > > rte_memseg_walk and try_expand_heap both will lock
> > > > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
> > > >
> > > > #0       rte_memseg_walk
> > > > #1  <-rte_eal_check_dma_mask
> > > > #2  <-alloc_pages_on_heap
> > > > #3  <-try_expand_heap_primary
> > > > #4  <-try_expand_heap
> > > >
> > > > Log as following:
> > > > EAL: TSC frequency is ~2494156 KHz
> > > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > > > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > > > EAL: Trying to obtain current memory policy.
> > > > EAL: Setting policy MPOL_PREFERRED for socket 0
> > > > EAL: Restoring previous memory policy: 0
> > > >
> > > > Could you have a check on this? A lot of test cases in our validation
> > > > team fail because of this. Thanks a lot!
> > >
> > > Can we just call rte_memseg_walk_thread_unsafe()?
> > >
> > > +Cc Anatoly
> >
> > Hi, Thomas
> >
> > I change to rte_memseg_walk_thread_unsafe(), still
> > Can't work.
> >
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> > EAL: memseg iova 140000000, len 40000000, out of range
> > EAL:    using dma mask ffffffffffffffff
> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 1
> > EAL: Restoring previous memory policy: 0
> > EAL: memseg iova 1bc0000000, len 40000000, out of range
> > EAL:    using dma mask ffffffffffffffff
> > EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
> > error allocating rte services array
> > EAL: FATAL: rte_service_init() failed
> > EAL: rte_service_init() failed
> > PANIC in main():
>
> I think it is showing there are at least 2 issues:
>         1/ deadlock
>         2/ allocation does not comply with mask check (out of range)
>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  9:36       ` Yao, Lei A
@ 2018-10-29  9:48         ` Thomas Monjalon
  2018-10-29 10:11           ` Alejandro Lucero
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-29  9:48 UTC (permalink / raw)
  To: Yao, Lei A
  Cc: Alejandro Lucero, dev, Xu, Qian Q, Lin, Xueqin, Burakov, Anatoly,
	ferruh.yigit

29/10/2018 10:36, Yao, Lei A:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 29/10/2018 09:23, Yao, Lei A:
> > > Hi, Lucero, Thomas
> > >
> > > This patch set will cause deadlock during memory initialization.
> > > rte_memseg_walk and try_expand_heap both will lock
> > > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
> > >
> > > #0       rte_memseg_walk
> > > #1  <-rte_eal_check_dma_mask
> > > #2  <-alloc_pages_on_heap
> > > #3  <-try_expand_heap_primary
> > > #4  <-try_expand_heap
> > >
> > > Log as following:
> > > EAL: TSC frequency is ~2494156 KHz
> > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > > EAL: Trying to obtain current memory policy.
> > > EAL: Setting policy MPOL_PREFERRED for socket 0
> > > EAL: Restoring previous memory policy: 0
> > >
> > > Could you have a check on this? A lot of test cases in our validation
> > > team fail because of this. Thanks a lot!
> > 
> > Can we just call rte_memseg_walk_thread_unsafe()?
> > 
> > +Cc Anatoly
> 
> Hi, Thomas
> 
> I change to rte_memseg_walk_thread_unsafe(), still
> Can't work. 
> 
> EAL: Setting policy MPOL_PREFERRED for socket 0
> EAL: Restoring previous memory policy: 0
> EAL: memseg iova 140000000, len 40000000, out of range
> EAL:    using dma mask ffffffffffffffff
> EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
> EAL: Trying to obtain current memory policy.
> EAL: Setting policy MPOL_PREFERRED for socket 1
> EAL: Restoring previous memory policy: 0
> EAL: memseg iova 1bc0000000, len 40000000, out of range
> EAL:    using dma mask ffffffffffffffff
> EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
> error allocating rte services array
> EAL: FATAL: rte_service_init() failed
> EAL: rte_service_init() failed
> PANIC in main():

I think it is showing there are at least 2 issues:
	1/ deadlock
	2/ allocation does not comply with mask check (out of range)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  9:25         ` Alejandro Lucero
@ 2018-10-29  9:44           ` Yao, Lei A
  0 siblings, 0 replies; 62+ messages in thread
From: Yao, Lei A @ 2018-10-29  9:44 UTC (permalink / raw)
  To: Alejandro Lucero, Thomas Monjalon
  Cc: dev, Xu, Qian Q, Lin, Xueqin, Burakov, Anatoly, Yigit, Ferruh,
	Richardson, Bruce

Hi, Lucero

My server info:
Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
Hugepage: 1G
Kernel: 4.15.0
OS: Ubuntu

Steps are simple:

1.       Bind one i40e/ixgbe  NIC to igb_uio

2.       Launch testpmd:

./x86_64-native-linuxapp-gcc/app/testpmd -c 0x03 -n 4 --log-level=eal,8 -- -i

BRs
Lei


From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
Sent: Monday, October 29, 2018 5:26 PM
To: Thomas Monjalon <thomas@monjalon.net>
Cc: Yao, Lei A <lei.a.yao@intel.com>; dev <dev@dpdk.org>; Xu, Qian Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask

Can we have the configuration triggering this issue?

On Mon, Oct 29, 2018 at 9:07 AM Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
One more comment about this issue,

There was no reply to the question asked by Alejandro on October 11th:
        http://mails.dpdk.org/archives/dev/2018-October/115402.html
and there were no more reviews despite all my requests:
        http://mails.dpdk.org/archives/dev/2018-October/117475.html
Without any more comment, I had to apply the patchset.

Now we need to find a solution. Please suggest.


29/10/2018 09:42, Thomas Monjalon:
> 29/10/2018 09:23, Yao, Lei A:
> > Hi, Lucero, Thomas
> >
> > This patch set will cause deadlock during memory initialization.
> > rte_memseg_walk and try_expand_heap both will lock
> > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
> >
> > #0       rte_memseg_walk
> > #1  <-rte_eal_check_dma_mask
> > #2  <-alloc_pages_on_heap
> > #3  <-try_expand_heap_primary
> > #4  <-try_expand_heap
> >
> > Log as following:
> > EAL: TSC frequency is ~2494156 KHz
> > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> >
> > Could you have a check on this? A lot of test cases in our validation
> > team fail because of this. Thanks a lot!
>
> Can we just call rte_memseg_walk_thread_unsafe()?
>
> +Cc Anatoly
>
>
> > From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Thomas Monjalon
> > > 05/10/2018 14:45, Alejandro Lucero:
> > > > I sent a patchset about this to be applied on 17.11 stable. The memory
> > > > code has had main changes since that version, so here it is the patchset
> > > > adjusted to current master repo.
> > > >
> > > > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > > > restricted range due to addressing limitations with some devices. There
> > > > are two known cases: NFP and IOMMU VT-d emulation.
> > > >
> > > > With this check IOVAs out of range are detected and PMDs can abort
> > > > initialization. For the VT-d case, IOVA VA mode is allowed as long as
> > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > > > default.
> > > >
> > > > For the addressing limitations known cases, there are just 40(NFP) or
> > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> > > > most systems. With machines using more memory, the added check will
> > > > ensure IOVAs within the range.
> > > >
> > > > With IOVA VA, and because the way the Linux kernel serves mmap calls
> > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > > > give an address hint with a lower starting address than the default one
> > > > used by the kernel, and then ensuring the mmap uses that hint or hint plus
> > > > some offset. With 64 bits systems, the process virtual address space is
> > > > large enoguh for doing the hugepages mmaping within the supported
> > > range
> > > > when those addressing limitations exist. This patchset also adds a change
> > > > for using such a hint making the use of IOVA VA a more than likely
> > > > possibility when there are those addressing limitations.
> > > >
> > > > The check is not done by default but just when it is required. This
> > > > patchset adds the check for NFP initialization and for setting the IOVA
> > > > mode is an emulated VT-d is detected. Also, because the recent patchset
> > > > adding dynamic memory allocation, the check is also invoked for ensuring
> > > > the new memsegs are within the required range.
> > > >
> > > > This patchset could be applied to stable 18.05.
> > >
> > > Applied, thanks



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  8:42     ` Thomas Monjalon
  2018-10-29  9:07       ` Thomas Monjalon
@ 2018-10-29  9:36       ` Yao, Lei A
  2018-10-29  9:48         ` Thomas Monjalon
  1 sibling, 1 reply; 62+ messages in thread
From: Yao, Lei A @ 2018-10-29  9:36 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Alejandro Lucero, dev, Xu, Qian Q, Lin, Xueqin, Burakov, Anatoly


> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Monday, October 29, 2018 4:43 PM
> To: Yao, Lei A <lei.a.yao@intel.com>
> Cc: Alejandro Lucero <alejandro.lucero@netronome.com>; dev@dpdk.org;
> Xu, Qian Q <qian.q.xu@intel.com>; Lin, Xueqin <xueqin.lin@intel.com>;
> Burakov, Anatoly <anatoly.burakov@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
> mask
> 
> 29/10/2018 09:23, Yao, Lei A:
> > Hi, Lucero, Thomas
> >
> > This patch set will cause deadlock during memory initialization.
> > rte_memseg_walk and try_expand_heap both will lock
> > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
> >
> > #0       rte_memseg_walk
> > #1  <-rte_eal_check_dma_mask
> > #2  <-alloc_pages_on_heap
> > #3  <-try_expand_heap_primary
> > #4  <-try_expand_heap
> >
> > Log as following:
> > EAL: TSC frequency is ~2494156 KHz
> > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> >
> > Could you have a check on this? A lot of test cases in our validation
> > team fail because of this. Thanks a lot!
> 
> Can we just call rte_memseg_walk_thread_unsafe()?
> 
> +Cc Anatoly

Hi, Thomas

I change to rte_memseg_walk_thread_unsafe(), still
Can't work. 

EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: memseg iova 140000000, len 40000000, out of range
EAL:    using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 1
EAL: Restoring previous memory policy: 0
EAL: memseg iova 1bc0000000, len 40000000, out of range
EAL:    using dma mask ffffffffffffffff
EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask
error allocating rte services array
EAL: FATAL: rte_service_init() failed
EAL: rte_service_init() failed
PANIC in main():

BRs
Lei
> 
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > > 05/10/2018 14:45, Alejandro Lucero:
> > > > I sent a patchset about this to be applied on 17.11 stable. The memory
> > > > code has had main changes since that version, so here it is the patchset
> > > > adjusted to current master repo.
> > > >
> > > > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > > > restricted range due to addressing limitations with some devices. There
> > > > are two known cases: NFP and IOMMU VT-d emulation.
> > > >
> > > > With this check IOVAs out of range are detected and PMDs can abort
> > > > initialization. For the VT-d case, IOVA VA mode is allowed as long as
> > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > > > default.
> > > >
> > > > For the addressing limitations known cases, there are just 40(NFP) or
> > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> > > > most systems. With machines using more memory, the added check
> will
> > > > ensure IOVAs within the range.
> > > >
> > > > With IOVA VA, and because the way the Linux kernel serves mmap calls
> > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > > > give an address hint with a lower starting address than the default one
> > > > used by the kernel, and then ensuring the mmap uses that hint or hint
> plus
> > > > some offset. With 64 bits systems, the process virtual address space is
> > > > large enoguh for doing the hugepages mmaping within the supported
> > > range
> > > > when those addressing limitations exist. This patchset also adds a
> change
> > > > for using such a hint making the use of IOVA VA a more than likely
> > > > possibility when there are those addressing limitations.
> > > >
> > > > The check is not done by default but just when it is required. This
> > > > patchset adds the check for NFP initialization and for setting the IOVA
> > > > mode is an emulated VT-d is detected. Also, because the recent
> patchset
> > > > adding dynamic memory allocation, the check is also invoked for
> ensuring
> > > > the new memsegs are within the required range.
> > > >
> > > > This patchset could be applied to stable 18.05.
> > >
> > > Applied, thanks
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  9:07       ` Thomas Monjalon
@ 2018-10-29  9:25         ` Alejandro Lucero
  2018-10-29  9:44           ` Yao, Lei A
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-29  9:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: lei.a.yao, dev, Xu, Qian Q, xueqin.lin, Burakov, Anatoly,
	Ferruh Yigit, Bruce Richardson

Can we have the configuration triggering this issue?

On Mon, Oct 29, 2018 at 9:07 AM Thomas Monjalon <thomas@monjalon.net> wrote:

> One more comment about this issue,
>
> There was no reply to the question asked by Alejandro on October 11th:
>         http://mails.dpdk.org/archives/dev/2018-October/115402.html
> and there were no more reviews despite all my requests:
>         http://mails.dpdk.org/archives/dev/2018-October/117475.html
> Without any more comment, I had to apply the patchset.
>
> Now we need to find a solution. Please suggest.
>
>
> 29/10/2018 09:42, Thomas Monjalon:
> > 29/10/2018 09:23, Yao, Lei A:
> > > Hi, Lucero, Thomas
> > >
> > > This patch set will cause deadlock during memory initialization.
> > > rte_memseg_walk and try_expand_heap both will lock
> > > the file &mcfg->memory_hotplug_lock. So dead lock will occur.
> > >
> > > #0       rte_memseg_walk
> > > #1  <-rte_eal_check_dma_mask
> > > #2  <-alloc_pages_on_heap
> > > #3  <-try_expand_heap_primary
> > > #4  <-try_expand_heap
> > >
> > > Log as following:
> > > EAL: TSC frequency is ~2494156 KHz
> > > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > > EAL: Trying to obtain current memory policy.
> > > EAL: Setting policy MPOL_PREFERRED for socket 0
> > > EAL: Restoring previous memory policy: 0
> > >
> > > Could you have a check on this? A lot of test cases in our validation
> > > team fail because of this. Thanks a lot!
> >
> > Can we just call rte_memseg_walk_thread_unsafe()?
> >
> > +Cc Anatoly
> >
> >
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > > > 05/10/2018 14:45, Alejandro Lucero:
> > > > > I sent a patchset about this to be applied on 17.11 stable. The
> memory
> > > > > code has had main changes since that version, so here it is the
> patchset
> > > > > adjusted to current master repo.
> > > > >
> > > > > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > > > > restricted range due to addressing limitations with some devices.
> There
> > > > > are two known cases: NFP and IOMMU VT-d emulation.
> > > > >
> > > > > With this check IOVAs out of range are detected and PMDs can abort
> > > > > initialization. For the VT-d case, IOVA VA mode is allowed as long
> as
> > > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > > > > default.
> > > > >
> > > > > For the addressing limitations known cases, there are just 40(NFP)
> or
> > > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those
> limitations
> > > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely
> enough for
> > > > > most systems. With machines using more memory, the added check will
> > > > > ensure IOVAs within the range.
> > > > >
> > > > > With IOVA VA, and because the way the Linux kernel serves mmap
> calls
> > > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > > > > give an address hint with a lower starting address than the
> default one
> > > > > used by the kernel, and then ensuring the mmap uses that hint or
> hint plus
> > > > > some offset. With 64 bits systems, the process virtual address
> space is
> > > > > large enoguh for doing the hugepages mmaping within the supported
> > > > range
> > > > > when those addressing limitations exist. This patchset also adds a
> change
> > > > > for using such a hint making the use of IOVA VA a more than likely
> > > > > possibility when there are those addressing limitations.
> > > > >
> > > > > The check is not done by default but just when it is required. This
> > > > > patchset adds the check for NFP initialization and for setting the
> IOVA
> > > > > mode is an emulated VT-d is detected. Also, because the recent
> patchset
> > > > > adding dynamic memory allocation, the check is also invoked for
> ensuring
> > > > > the new memsegs are within the required range.
> > > > >
> > > > > This patchset could be applied to stable 18.05.
> > > >
> > > > Applied, thanks
>
>
>
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  8:42     ` Thomas Monjalon
@ 2018-10-29  9:07       ` Thomas Monjalon
  2018-10-29  9:25         ` Alejandro Lucero
  2018-10-29  9:36       ` Yao, Lei A
  1 sibling, 1 reply; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-29  9:07 UTC (permalink / raw)
  To: Yao, Lei A
  Cc: Alejandro Lucero, dev, Xu, Qian Q, Lin, Xueqin, anatoly.burakov,
	ferruh.yigit, bruce.richardson

One more comment about this issue,

There was no reply to the question asked by Alejandro on October 11th:
	http://mails.dpdk.org/archives/dev/2018-October/115402.html
and there were no more reviews despite all my requests:
	http://mails.dpdk.org/archives/dev/2018-October/117475.html
Without any more comment, I had to apply the patchset.

Now we need to find a solution. Please suggest.


29/10/2018 09:42, Thomas Monjalon:
> 29/10/2018 09:23, Yao, Lei A:
> > Hi, Lucero, Thomas
> > 
> > This patch set will cause deadlock during memory initialization. 
> > rte_memseg_walk and try_expand_heap both will lock 
> > the file &mcfg->memory_hotplug_lock. So dead lock will occur. 
> > 
> > #0       rte_memseg_walk
> > #1  <-rte_eal_check_dma_mask
> > #2  <-alloc_pages_on_heap
> > #3  <-try_expand_heap_primary   
> > #4  <-try_expand_heap
> > 
> > Log as following:
> > EAL: TSC frequency is ~2494156 KHz
> > EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> > [New Thread 0x7ffff5e0d700 (LWP 330350)]
> > EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> > EAL: Trying to obtain current memory policy.
> > EAL: Setting policy MPOL_PREFERRED for socket 0
> > EAL: Restoring previous memory policy: 0
> > 
> > Could you have a check on this? A lot of test cases in our validation 
> > team fail because of this. Thanks a lot!
> 
> Can we just call rte_memseg_walk_thread_unsafe()?
> 
> +Cc Anatoly
> 
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > > 05/10/2018 14:45, Alejandro Lucero:
> > > > I sent a patchset about this to be applied on 17.11 stable. The memory
> > > > code has had main changes since that version, so here it is the patchset
> > > > adjusted to current master repo.
> > > >
> > > > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > > > restricted range due to addressing limitations with some devices. There
> > > > are two known cases: NFP and IOMMU VT-d emulation.
> > > >
> > > > With this check IOVAs out of range are detected and PMDs can abort
> > > > initialization. For the VT-d case, IOVA VA mode is allowed as long as
> > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > > > default.
> > > >
> > > > For the addressing limitations known cases, there are just 40(NFP) or
> > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> > > > most systems. With machines using more memory, the added check will
> > > > ensure IOVAs within the range.
> > > >
> > > > With IOVA VA, and because the way the Linux kernel serves mmap calls
> > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > > > give an address hint with a lower starting address than the default one
> > > > used by the kernel, and then ensuring the mmap uses that hint or hint plus
> > > > some offset. With 64 bits systems, the process virtual address space is
> > > > large enoguh for doing the hugepages mmaping within the supported
> > > range
> > > > when those addressing limitations exist. This patchset also adds a change
> > > > for using such a hint making the use of IOVA VA a more than likely
> > > > possibility when there are those addressing limitations.
> > > >
> > > > The check is not done by default but just when it is required. This
> > > > patchset adds the check for NFP initialization and for setting the IOVA
> > > > mode is an emulated VT-d is detected. Also, because the recent patchset
> > > > adding dynamic memory allocation, the check is also invoked for ensuring
> > > > the new memsegs are within the required range.
> > > >
> > > > This patchset could be applied to stable 18.05.
> > > 
> > > Applied, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-29  8:23   ` Yao, Lei A
@ 2018-10-29  8:42     ` Thomas Monjalon
  2018-10-29  9:07       ` Thomas Monjalon
  2018-10-29  9:36       ` Yao, Lei A
  0 siblings, 2 replies; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-29  8:42 UTC (permalink / raw)
  To: Yao, Lei A
  Cc: Alejandro Lucero, dev, Xu, Qian Q, Lin, Xueqin, anatoly.burakov

29/10/2018 09:23, Yao, Lei A:
> Hi, Lucero, Thomas
> 
> This patch set will cause deadlock during memory initialization. 
> rte_memseg_walk and try_expand_heap both will lock 
> the file &mcfg->memory_hotplug_lock. So dead lock will occur. 
> 
> #0       rte_memseg_walk
> #1  <-rte_eal_check_dma_mask
> #2  <-alloc_pages_on_heap
> #3  <-try_expand_heap_primary   
> #4  <-try_expand_heap
> 
> Log as following:
> EAL: TSC frequency is ~2494156 KHz
> EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
> [New Thread 0x7ffff5e0d700 (LWP 330350)]
> EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
> EAL: Trying to obtain current memory policy.
> EAL: Setting policy MPOL_PREFERRED for socket 0
> EAL: Restoring previous memory policy: 0
> 
> Could you have a check on this? A lot of test cases in our validation 
> team fail because of this. Thanks a lot!

Can we just call rte_memseg_walk_thread_unsafe()?

+Cc Anatoly


> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> > 05/10/2018 14:45, Alejandro Lucero:
> > > I sent a patchset about this to be applied on 17.11 stable. The memory
> > > code has had main changes since that version, so here it is the patchset
> > > adjusted to current master repo.
> > >
> > > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > > restricted range due to addressing limitations with some devices. There
> > > are two known cases: NFP and IOMMU VT-d emulation.
> > >
> > > With this check IOVAs out of range are detected and PMDs can abort
> > > initialization. For the VT-d case, IOVA VA mode is allowed as long as
> > > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > > default.
> > >
> > > For the addressing limitations known cases, there are just 40(NFP) or
> > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> > > most systems. With machines using more memory, the added check will
> > > ensure IOVAs within the range.
> > >
> > > With IOVA VA, and because the way the Linux kernel serves mmap calls
> > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > > give an address hint with a lower starting address than the default one
> > > used by the kernel, and then ensuring the mmap uses that hint or hint plus
> > > some offset. With 64 bits systems, the process virtual address space is
> > > large enoguh for doing the hugepages mmaping within the supported
> > range
> > > when those addressing limitations exist. This patchset also adds a change
> > > for using such a hint making the use of IOVA VA a more than likely
> > > possibility when there are those addressing limitations.
> > >
> > > The check is not done by default but just when it is required. This
> > > patchset adds the check for NFP initialization and for setting the IOVA
> > > mode is an emulated VT-d is detected. Also, because the recent patchset
> > > adding dynamic memory allocation, the check is also invoked for ensuring
> > > the new memsegs are within the required range.
> > >
> > > This patchset could be applied to stable 18.05.
> > 
> > Applied, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-28 21:04 ` Thomas Monjalon
@ 2018-10-29  8:23   ` Yao, Lei A
  2018-10-29  8:42     ` Thomas Monjalon
  0 siblings, 1 reply; 62+ messages in thread
From: Yao, Lei A @ 2018-10-29  8:23 UTC (permalink / raw)
  To: Thomas Monjalon, Alejandro Lucero; +Cc: dev, Xu, Qian Q, Lin, Xueqin

Hi, Lucero, Thomas

This patch set will cause deadlock during memory initialization. 
rte_memseg_walk and try_expand_heap both will lock 
the file &mcfg->memory_hotplug_lock. So dead lock will occur. 

#0       rte_memseg_walk
#1  <-rte_eal_check_dma_mask
#2  <-alloc_pages_on_heap
#3  <-try_expand_heap_primary   
#4  <-try_expand_heap

Log as following:
EAL: TSC frequency is ~2494156 KHz
EAL: Master lcore 0 is ready (tid=7ffff7fe3c00;cpuset=[0])
[New Thread 0x7ffff5e0d700 (LWP 330350)]
EAL: lcore 1 is ready (tid=7ffff5e0d700;cpuset=[1])
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0

Could you have a check on this? A lot of test cases in our validation 
team fail because of this. Thanks a lot!

BRs
Lei


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Monday, October 29, 2018 5:04 AM
> To: Alejandro Lucero <alejandro.lucero@netronome.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA
> mask
> 
> 05/10/2018 14:45, Alejandro Lucero:
> > I sent a patchset about this to be applied on 17.11 stable. The memory
> > code has had main changes since that version, so here it is the patchset
> > adjusted to current master repo.
> >
> > This patchset adds, mainly, a check for ensuring IOVAs are within a
> > restricted range due to addressing limitations with some devices. There
> > are two known cases: NFP and IOMMU VT-d emulation.
> >
> > With this check IOVAs out of range are detected and PMDs can abort
> > initialization. For the VT-d case, IOVA VA mode is allowed as long as
> > IOVAs are within the supported range, avoiding to forbid IOVA VA by
> > default.
> >
> > For the addressing limitations known cases, there are just 40(NFP) or
> > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> > most systems. With machines using more memory, the added check will
> > ensure IOVAs within the range.
> >
> > With IOVA VA, and because the way the Linux kernel serves mmap calls
> > in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> > give an address hint with a lower starting address than the default one
> > used by the kernel, and then ensuring the mmap uses that hint or hint plus
> > some offset. With 64 bits systems, the process virtual address space is
> > large enoguh for doing the hugepages mmaping within the supported
> range
> > when those addressing limitations exist. This patchset also adds a change
> > for using such a hint making the use of IOVA VA a more than likely
> > possibility when there are those addressing limitations.
> >
> > The check is not done by default but just when it is required. This
> > patchset adds the check for NFP initialization and for setting the IOVA
> > mode is an emulated VT-d is detected. Also, because the recent patchset
> > adding dynamic memory allocation, the check is also invoked for ensuring
> > the new memsegs are within the required range.
> >
> > This patchset could be applied to stable 18.05.
> 
> Applied, thanks
> 
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
  2018-10-05 12:45 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-10-28 21:04 ` Thomas Monjalon
  2018-10-29  8:23   ` Yao, Lei A
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Monjalon @ 2018-10-28 21:04 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev

05/10/2018 14:45, Alejandro Lucero:
> I sent a patchset about this to be applied on 17.11 stable. The memory
> code has had main changes since that version, so here it is the patchset
> adjusted to current master repo.
> 
> This patchset adds, mainly, a check for ensuring IOVAs are within a
> restricted range due to addressing limitations with some devices. There
> are two known cases: NFP and IOMMU VT-d emulation.
> 
> With this check IOVAs out of range are detected and PMDs can abort
> initialization. For the VT-d case, IOVA VA mode is allowed as long as
> IOVAs are within the supported range, avoiding to forbid IOVA VA by
> default.
> 
> For the addressing limitations known cases, there are just 40(NFP) or
> 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
> imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
> most systems. With machines using more memory, the added check will
> ensure IOVAs within the range.
> 
> With IOVA VA, and because the way the Linux kernel serves mmap calls
> in 64 bits systems, 39 or 40 bits are not enough. It is possible to
> give an address hint with a lower starting address than the default one
> used by the kernel, and then ensuring the mmap uses that hint or hint plus
> some offset. With 64 bits systems, the process virtual address space is
> large enoguh for doing the hugepages mmaping within the supported range
> when those addressing limitations exist. This patchset also adds a change
> for using such a hint making the use of IOVA VA a more than likely
> possibility when there are those addressing limitations.
> 
> The check is not done by default but just when it is required. This
> patchset adds the check for NFP initialization and for setting the IOVA
> mode is an emulated VT-d is detected. Also, because the recent patchset
> adding dynamic memory allocation, the check is also invoked for ensuring
> the new memsegs are within the required range.
> 
> This patchset could be applied to stable 18.05.

Applied, thanks

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
@ 2018-10-05 12:45 Alejandro Lucero
  2018-10-28 21:04 ` Thomas Monjalon
  0 siblings, 1 reply; 62+ messages in thread
From: Alejandro Lucero @ 2018-10-05 12:45 UTC (permalink / raw)
  To: dev

I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.

This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.

With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.

For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.

With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.

The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.

This patchset could be applied to stable 18.05.

v2:
 - change logs from INFO to DEBUG
 - only keeps dma mask if device capable of addressing allocated memory
 - add ABI changes
 - change hint address increment to page size
 - split pci/bus commit in two
 - fix commits

v3:
 - remove previous code about keeping dma mask

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-10-30 15:09 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-04 12:53 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-07-10  8:56   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-07-10  9:34     ` Alejandro Lucero
2018-07-10 10:06       ` Eelco Chaudron
2018-07-10 10:52         ` Alejandro Lucero
2018-07-10 11:14           ` Eelco Chaudron
2018-07-10 11:33             ` Burakov, Anatoly
2018-07-10 11:43               ` Alejandro Lucero
2018-07-10 11:55                 ` Eelco Chaudron
2018-07-10 11:40             ` Alejandro Lucero
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 2/6] ethdev: add function for checking IOVAs by a device Alejandro Lucero
2018-07-07 17:30   ` Andrew Rybchenko
2018-07-10  8:57   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-07-10  9:42     ` Alejandro Lucero
2018-07-10  9:44       ` Alejandro Lucero
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 3/6] bus/pci: use IOVAs check when setting IOVA mode Alejandro Lucero
2018-07-10 10:14   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-07-10 15:37     ` Alejandro Lucero
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 4/6] mem: use address hint for mapping hugepages Alejandro Lucero
2018-07-10 11:15   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 5/6] net/nfp: check hugepages IOVAs based on DMA mask Alejandro Lucero
2018-07-10 10:17   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-07-04 12:53 ` [dpdk-dev] [PATCH v3 6/6] net/nfp: support IOVA VA mode Alejandro Lucero
2018-07-10 10:18   ` [dpdk-dev] [dpdk-stable] " Eelco Chaudron
2018-10-05 12:45 [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-10-28 21:04 ` Thomas Monjalon
2018-10-29  8:23   ` Yao, Lei A
2018-10-29  8:42     ` Thomas Monjalon
2018-10-29  9:07       ` Thomas Monjalon
2018-10-29  9:25         ` Alejandro Lucero
2018-10-29  9:44           ` Yao, Lei A
2018-10-29  9:36       ` Yao, Lei A
2018-10-29  9:48         ` Thomas Monjalon
2018-10-29 10:11           ` Alejandro Lucero
2018-10-29 10:15             ` Alejandro Lucero
2018-10-29 11:39               ` Alejandro Lucero
2018-10-29 11:46                 ` Thomas Monjalon
2018-10-29 12:55                   ` Alejandro Lucero
2018-10-29 13:18                     ` Yao, Lei A
2018-10-29 13:40                       ` Alejandro Lucero
2018-10-29 14:18                         ` Thomas Monjalon
2018-10-29 14:35                           ` Alejandro Lucero
2018-10-29 18:54                           ` Yongseok Koh
2018-10-29 19:37                             ` Alejandro Lucero
2018-10-30 10:10                               ` Burakov, Anatoly
2018-10-30 10:11                           ` Burakov, Anatoly
2018-10-30 10:19                             ` Alejandro Lucero
2018-10-30  3:20                         ` Lin, Xueqin
2018-10-30  9:41                           ` Alejandro Lucero
2018-10-30 10:33                             ` Lin, Xueqin
2018-10-30 10:38                               ` Alejandro Lucero
2018-10-30 12:21                                 ` Lin, Xueqin
2018-10-30 12:37                                   ` Alejandro Lucero
2018-10-30 14:04                                     ` Alejandro Lucero
2018-10-30 14:14                                       ` Burakov, Anatoly
2018-10-30 14:45                                         ` Alejandro Lucero
2018-10-30 14:45                                       ` Lin, Xueqin
2018-10-30 14:57                                         ` Alejandro Lucero
2018-10-30 15:09                                           ` Lin, Xueqin
2018-10-30 10:18                 ` Burakov, Anatoly
2018-10-30 10:23                   ` Alejandro Lucero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).