DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC] Add support for device dma mask
@ 2018-06-26 17:37 Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 1/6] eal: add internal " Alejandro Lucero
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

This RFC tries to handle devices with addressing limitations. NFP devices
4000/6000 can just handle addresses with 40 bits implying problems for handling
physical address when machines have more than 1TB of memory. But because how
iovas are configured, which can be equivalent to physical addresses or based on
virtual addresses, this can be a more likely problem.

I tried to solve this some time ago:

https://www.mail-archive.com/dev@dpdk.org/msg45214.html

It was delayed because there was some changes in progress with EAL device
handling, and, being honest, I completely forgot about this until now, when
I have had to work on supporting NFP devices with DPDK and non-root users.

I was working on a patch for being applied on main DPDK branch upstream, but
because changes to memory initialization during the last months, this can not
be backported to stable versions, at least the part where the hugepages iovas
are checked.

I realize stable versions only allow bug fixing, and this patchset could
arguably not be considered as so. But without this, it could be, although
unlikely, a DPDK used in a machine with more than 1TB, and then NFP using
the wrong DMA host addresses.

Although virtual addresses used as iovas are more dangerous, for DPDK versions
before 18.05 this is not worse than with physical addresses, because iovas,
when physical addresses are not available, are based on a starting address set
to 0x0. Since 18.05, those iovas can, and usually are, higher than 1TB, as they
are based on 64 bits address space addresses, and by default the kernel uses a
starting point far higher than 1TB.

This patchset applies to stable 17.11.3 but I will be happy to submit patches, if
required, for other DPDK stable versions.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 1/6] eal: add internal dma mask
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 2/6] mem: add hugepages check Alejandro Lucero
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

Devices can have addressing limitations and an internal dma mask
will track the more restrictive dma mask set by a device.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_eal/common/eal_common_options.c | 1 +
 lib/librte_eal/common/eal_internal_cfg.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 996a034..2d7c839 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -205,6 +205,7 @@ struct device_option {
 	for (i = 0; i < MAX_HUGEPAGE_SIZES; i++)
 		internal_cfg->hugepage_info[i].lock_descriptor = -1;
 	internal_cfg->base_virtaddr = 0;
+	internal_cfg->dma_mask = 0;
 
 	internal_cfg->syslog_facility = LOG_DAEMON;
 
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index fa6ccbe..e1e2944 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -84,6 +84,7 @@ struct internal_config {
 	const char *mbuf_pool_ops_name;   /**< mbuf pool ops name */
 	unsigned num_hugepage_sizes;      /**< how many sizes on this system */
 	struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES];
+	uint64_t dma_mask;
 };
 extern struct internal_config internal_config; /**< Global EAL configuration. */
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 2/6] mem: add hugepages check
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 1/6] eal: add internal " Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 3/6] eal: check hugepages within dma mask range Alejandro Lucero
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

Devices can have addressing limitations and a driver can set a dma
mask. This patch adds a function for checking hugepages iovas are within
the range supported by the dma mask.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 36 ++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 17c20d4..4c196a6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1334,6 +1334,42 @@ void numa_error(char *where)
 	return -1;
 }
 
+int
+rte_eal_memory_dma_mask_check(void)
+{
+	struct rte_mem_config *mcfg;
+	int i;
+	int total_segs_checked = 0;
+	uint64_t mask;
+
+	if (!internal_config.dma_mask)
+		return 0;
+
+	mask = 1ULL << internal_config.dma_mask;
+	mask -= 1;
+
+	/* get pointer to global configuration */
+	mcfg = rte_eal_get_configuration()->mem_config;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		RTE_LOG(DEBUG, EAL, "Memseg %d with iova %"PRIx64" and mask %"PRIx64"\n", i,
+				    mcfg->memseg[i].iova, mask);
+
+		if (!mcfg->memseg[i].iova)
+			break;
+
+		if (mcfg->memseg[i].iova & ~mask) {
+			return -1;
+		}
+		total_segs_checked++;
+	}
+
+	RTE_LOG(DEBUG, EAL, "ALEJ: %d segments successfully checked with dma mask\n",
+			    total_segs_checked);
+
+	return 0;
+}
+
 /*
  * uses fstat to report the size of a file on disk
  */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 3/6] eal: check hugepages within dma mask range
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 1/6] eal: add internal " Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 2/6] mem: add hugepages check Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 4/6] mem: add function for setting internal dma mask Alejandro Lucero
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

Hugepages get an iova address which could be out of range
for devices with addressing limitations. This patch checks
hugepages are withint the range if dma mask is set by a device.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_eal/common/eal_private.h | 3 +++
 lib/librte_eal/linuxapp/eal/eal.c   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 462226f..05db535 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -224,4 +224,7 @@
  */
 struct rte_bus *rte_bus_find_by_device_name(const char *str);
 
+/* if dma mask set by a device, check hugepages are not out of range */
+int rte_eal_memory_dma_mask_check(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 229eec9..eaa9325 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -960,6 +960,10 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
+	/* If dma mask set, check hugepages iovas are within the range */
+	if (rte_eal_memory_dma_mask_check() < 0)
+		rte_panic("iovas out of range\n");
+
 	/* initialize default service/lcore mappings and start running. Ignore
 	 * -ENOTSUP, as it indicates no service coremask passed to EAL.
 	 */
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 4/6] mem: add function for setting internal dma mask
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
                   ` (2 preceding siblings ...)
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 3/6] eal: check hugepages within dma mask range Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 5/6] ethdev: add function for " Alejandro Lucero
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

A device with addressing limitations will invoke this function
for setting a dma mask. It has no effect if there is another
dma mask already set and more restrictive than this one.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_eal/common/eal_common_memory.c  | 15 +++++++++++++++
 lib/librte_eal/common/include/rte_memory.h |  3 +++
 2 files changed, 18 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fc6c44d..39bf98c 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -109,6 +109,21 @@
 	}
 }
 
+/* set global dma mask based on device dma mask */
+void
+rte_eal_set_dma_mask(uint8_t maskbits) {
+
+	/* If no dma mask yet this is the new one */
+	if (!internal_config.dma_mask) {
+		internal_config.dma_mask = maskbits;
+		return;
+	}
+
+	/* Set dma mask just if more restrictive than current one */
+	if (internal_config.dma_mask > maskbits)
+		internal_config.dma_mask = maskbits;
+}
+
 /* return the number of memory channels */
 unsigned rte_memory_get_nchannel(void)
 {
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 80a8fc0..a078c31 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -209,6 +209,9 @@ struct rte_memseg {
  */
 unsigned rte_memory_get_nrank(void);
 
+/* set global dma mask based on a specific device dma mask */
+void rte_eal_set_dma_mask(uint8_t maskbits);
+
 /**
  * Drivers based on uio will not load unless physical
  * addresses are obtainable. It is only possible to get
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 5/6] ethdev: add function for dma mask
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
                   ` (3 preceding siblings ...)
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 4/6] mem: add function for setting internal dma mask Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 6/6] net/nfp: set " Alejandro Lucero
  2018-06-27  8:17 ` [dpdk-dev] [RFC] Add support for device " Burakov, Anatoly
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

This function calls a generic one for ethernet devices.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 lib/librte_ether/rte_ethdev.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index eba11ca..e3979e4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2799,6 +2799,18 @@ int rte_eth_dev_set_vlan_ether_type(uint16_t port_id,
 int rte_eth_dev_set_vlan_pvid(uint16_t port_id, uint16_t pvid, int on);
 
 /**
+ * Set global dma mask by a device
+ *
+ * @param maskbits
+ *  mask length in bits
+ *
+ */
+static inline void
+rte_eth_dev_set_dma_mask(uint8_t maskbits) {
+	rte_eal_set_dma_mask(maskbits);
+}
+
+/**
  *
  * Retrieve a burst of input packets from a receive queue of an Ethernet
  * device. The retrieved packets are stored in *rte_mbuf* structures whose
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [dpdk-dev] [PATCH 6/6] net/nfp: set dma mask
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
                   ` (4 preceding siblings ...)
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 5/6] ethdev: add function for " Alejandro Lucero
@ 2018-06-26 17:37 ` Alejandro Lucero
  2018-06-27  8:17 ` [dpdk-dev] [RFC] Add support for device " Burakov, Anatoly
  6 siblings, 0 replies; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-26 17:37 UTC (permalink / raw)
  To: dev; +Cc: stable, anatoly.burakov

NFP 4000/6000 devices can not use iova addresses requiring more
than 40 bits. This patch sets a dma mask for avoiding hugepages
with iova requiring more than those 40 bits.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/nfp_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index d9cd047..7ac03f0 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2915,6 +2915,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 	rte_free(port_name);
 
+	rte_eth_dev_set_dma_mask(40);
+
 	return ret;
 }
 
-- 
1.9.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
                   ` (5 preceding siblings ...)
  2018-06-26 17:37 ` [dpdk-dev] [PATCH 6/6] net/nfp: set " Alejandro Lucero
@ 2018-06-27  8:17 ` Burakov, Anatoly
  2018-06-27 10:13   ` Alejandro Lucero
  6 siblings, 1 reply; 16+ messages in thread
From: Burakov, Anatoly @ 2018-06-27  8:17 UTC (permalink / raw)
  To: Alejandro Lucero, dev; +Cc: stable

On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
> This RFC tries to handle devices with addressing limitations. NFP devices
> 4000/6000 can just handle addresses with 40 bits implying problems for handling
> physical address when machines have more than 1TB of memory. But because how
> iovas are configured, which can be equivalent to physical addresses or based on
> virtual addresses, this can be a more likely problem.
> 
> I tried to solve this some time ago:
> 
> https://www.mail-archive.com/dev@dpdk.org/msg45214.html
> 
> It was delayed because there was some changes in progress with EAL device
> handling, and, being honest, I completely forgot about this until now, when
> I have had to work on supporting NFP devices with DPDK and non-root users.
> 
> I was working on a patch for being applied on main DPDK branch upstream, but
> because changes to memory initialization during the last months, this can not
> be backported to stable versions, at least the part where the hugepages iovas
> are checked.
> 
> I realize stable versions only allow bug fixing, and this patchset could
> arguably not be considered as so. But without this, it could be, although
> unlikely, a DPDK used in a machine with more than 1TB, and then NFP using
> the wrong DMA host addresses.
> 
> Although virtual addresses used as iovas are more dangerous, for DPDK versions
> before 18.05 this is not worse than with physical addresses, because iovas,
> when physical addresses are not available, are based on a starting address set
> to 0x0.

You might want to look at the following patch:

http://patches.dpdk.org/patch/37149/

Since this patch, IOVA as VA mode uses VA addresses, and that has been 
backported to earlier releases. I don't think there's any case where we 
used zero-based addresses any more.

  Since 18.05, those iovas can, and usually are, higher than 1TB, as they
> are based on 64 bits address space addresses, and by default the kernel uses a
> starting point far higher than 1TB.
> 
> This patchset applies to stable 17.11.3 but I will be happy to submit patches, if
> required, for other DPDK stable versions.
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-27  8:17 ` [dpdk-dev] [RFC] Add support for device " Burakov, Anatoly
@ 2018-06-27 10:13   ` Alejandro Lucero
  2018-06-27 13:24     ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-27 10:13 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, stable

On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
>
>> This RFC tries to handle devices with addressing limitations. NFP devices
>> 4000/6000 can just handle addresses with 40 bits implying problems for
>> handling
>> physical address when machines have more than 1TB of memory. But because
>> how
>> iovas are configured, which can be equivalent to physical addresses or
>> based on
>> virtual addresses, this can be a more likely problem.
>>
>> I tried to solve this some time ago:
>>
>> https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>>
>> It was delayed because there was some changes in progress with EAL device
>> handling, and, being honest, I completely forgot about this until now,
>> when
>> I have had to work on supporting NFP devices with DPDK and non-root users.
>>
>> I was working on a patch for being applied on main DPDK branch upstream,
>> but
>> because changes to memory initialization during the last months, this can
>> not
>> be backported to stable versions, at least the part where the hugepages
>> iovas
>> are checked.
>>
>> I realize stable versions only allow bug fixing, and this patchset could
>> arguably not be considered as so. But without this, it could be, although
>> unlikely, a DPDK used in a machine with more than 1TB, and then NFP using
>> the wrong DMA host addresses.
>>
>> Although virtual addresses used as iovas are more dangerous, for DPDK
>> versions
>> before 18.05 this is not worse than with physical addresses, because
>> iovas,
>> when physical addresses are not available, are based on a starting
>> address set
>> to 0x0.
>>
>
> You might want to look at the following patch:
>
> http://patches.dpdk.org/patch/37149/
>
> Since this patch, IOVA as VA mode uses VA addresses, and that has been
> backported to earlier releases. I don't think there's any case where we
> used zero-based addresses any more.
>
>
But memsegs get the iova based on hugepages physaddr, and for VA mode that
is based on 0x0 as starting point.

And as far as I know, memsegs iovas are what end up being used for IOMMU
mappings and what devices will use.


>
>  Since 18.05, those iovas can, and usually are, higher than 1TB, as they
>
>> are based on 64 bits address space addresses, and by default the kernel
>> uses a
>> starting point far higher than 1TB.
>>
>> This patchset applies to stable 17.11.3 but I will be happy to submit
>> patches, if
>> required, for other DPDK stable versions.
>>
>>
>>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-27 10:13   ` Alejandro Lucero
@ 2018-06-27 13:24     ` Burakov, Anatoly
  2018-06-27 16:52       ` Alejandro Lucero
  0 siblings, 1 reply; 16+ messages in thread
From: Burakov, Anatoly @ 2018-06-27 13:24 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev, stable

On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
> 
> 
> On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
> 
>         This RFC tries to handle devices with addressing limitations.
>         NFP devices
>         4000/6000 can just handle addresses with 40 bits implying
>         problems for handling
>         physical address when machines have more than 1TB of memory. But
>         because how
>         iovas are configured, which can be equivalent to physical
>         addresses or based on
>         virtual addresses, this can be a more likely problem.
> 
>         I tried to solve this some time ago:
> 
>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
> 
>         It was delayed because there was some changes in progress with
>         EAL device
>         handling, and, being honest, I completely forgot about this
>         until now, when
>         I have had to work on supporting NFP devices with DPDK and
>         non-root users.
> 
>         I was working on a patch for being applied on main DPDK branch
>         upstream, but
>         because changes to memory initialization during the last months,
>         this can not
>         be backported to stable versions, at least the part where the
>         hugepages iovas
>         are checked.
> 
>         I realize stable versions only allow bug fixing, and this
>         patchset could
>         arguably not be considered as so. But without this, it could be,
>         although
>         unlikely, a DPDK used in a machine with more than 1TB, and then
>         NFP using
>         the wrong DMA host addresses.
> 
>         Although virtual addresses used as iovas are more dangerous, for
>         DPDK versions
>         before 18.05 this is not worse than with physical addresses,
>         because iovas,
>         when physical addresses are not available, are based on a
>         starting address set
>         to 0x0.
> 
> 
>     You might want to look at the following patch:
> 
>     http://patches.dpdk.org/patch/37149/
>     <http://patches.dpdk.org/patch/37149/>
> 
>     Since this patch, IOVA as VA mode uses VA addresses, and that has
>     been backported to earlier releases. I don't think there's any case
>     where we used zero-based addresses any more.
> 
> 
> But memsegs get the iova based on hugepages physaddr, and for VA mode 
> that is based on 0x0 as starting point.
> 
> And as far as I know, memsegs iovas are what end up being used for IOMMU 
> mappings and what devices will use.

For when physaddrs are available, IOVA as PA mode assigns IOVA addresses 
to PA, while IOVA as VA mode assigns IOVA addresses to VA (both 18.05+ 
and pre-18.05 as per above patch, which was applied to pre-18.05 stable 
releases).

When physaddrs aren't available, IOVA as VA mode assigns IOVA addresses 
to VA, both 18.05+ and pre-18.05, as per above patch.

If physaddrs aren't available and IOVA as PA mode is used, then i as far 
as i can remember, even though technically memsegs get their addresses 
set to 0x0 onwards, the actual addresses we get in memzones etc. are 
RTE_BAD_IOVA.

> 
> 
>       Since 18.05, those iovas can, and usually are, higher than 1TB, as
>     they
> 
>         are based on 64 bits address space addresses, and by default the
>         kernel uses a
>         starting point far higher than 1TB.
> 
>         This patchset applies to stable 17.11.3 but I will be happy to
>         submit patches, if
>         required, for other DPDK stable versions.
> 
> 
> 
> 
>     -- 
>     Thanks,
>     Anatoly
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-27 13:24     ` Burakov, Anatoly
@ 2018-06-27 16:52       ` Alejandro Lucero
  2018-06-28  8:54         ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-27 16:52 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, stable

On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
>
>
>>
>> On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly <
>> anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>
>>     On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
>>
>>         This RFC tries to handle devices with addressing limitations.
>>         NFP devices
>>         4000/6000 can just handle addresses with 40 bits implying
>>         problems for handling
>>         physical address when machines have more than 1TB of memory. But
>>         because how
>>         iovas are configured, which can be equivalent to physical
>>         addresses or based on
>>         virtual addresses, this can be a more likely problem.
>>
>>         I tried to solve this some time ago:
>>
>>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>>
>>         It was delayed because there was some changes in progress with
>>         EAL device
>>         handling, and, being honest, I completely forgot about this
>>         until now, when
>>         I have had to work on supporting NFP devices with DPDK and
>>         non-root users.
>>
>>         I was working on a patch for being applied on main DPDK branch
>>         upstream, but
>>         because changes to memory initialization during the last months,
>>         this can not
>>         be backported to stable versions, at least the part where the
>>         hugepages iovas
>>         are checked.
>>
>>         I realize stable versions only allow bug fixing, and this
>>         patchset could
>>         arguably not be considered as so. But without this, it could be,
>>         although
>>         unlikely, a DPDK used in a machine with more than 1TB, and then
>>         NFP using
>>         the wrong DMA host addresses.
>>
>>         Although virtual addresses used as iovas are more dangerous, for
>>         DPDK versions
>>         before 18.05 this is not worse than with physical addresses,
>>         because iovas,
>>         when physical addresses are not available, are based on a
>>         starting address set
>>         to 0x0.
>>
>>
>>     You might want to look at the following patch:
>>
>>     http://patches.dpdk.org/patch/37149/
>>     <http://patches.dpdk.org/patch/37149/>
>>
>>     Since this patch, IOVA as VA mode uses VA addresses, and that has
>>     been backported to earlier releases. I don't think there's any case
>>     where we used zero-based addresses any more.
>>
>>
>> But memsegs get the iova based on hugepages physaddr, and for VA mode
>> that is based on 0x0 as starting point.
>>
>> And as far as I know, memsegs iovas are what end up being used for IOMMU
>> mappings and what devices will use.
>>
>
> For when physaddrs are available, IOVA as PA mode assigns IOVA addresses
> to PA, while IOVA as VA mode assigns IOVA addresses to VA (both 18.05+ and
> pre-18.05 as per above patch, which was applied to pre-18.05 stable
> releases).
>
> When physaddrs aren't available, IOVA as VA mode assigns IOVA addresses to
> VA, both 18.05+ and pre-18.05, as per above patch.
>
>
This is right.


> If physaddrs aren't available and IOVA as PA mode is used, then i as far
> as i can remember, even though technically memsegs get their addresses set
> to 0x0 onwards, the actual addresses we get in memzones etc. are
> RTE_BAD_IOVA.
>
>
This is not right. Not sure if this was the intention, but if PA mode and
physaddrs not available, this code inside vfio_type1_dma_map:

                if (rte_eal_iova_mode() == RTE_IOVA_VA)

                        dma_map.iova = dma_map.vaddr;

                else

                        dma_map.iova = ms[i].iova;

does the IOMMU mapping using the iovas and not the vaddr, with the iovas
starting at 0x0.

Note that NFP PMD has not the RTE_PCI_DRV_IOVA_AS_VA flag, so this is
always the case when executing DPDK apps as non-root users.

I would say, if there is no such a flag, and then IOVA mode is PA, the
mapping should fail, as it occurs with 18.05.

I could send a patch for having this behaviour, but in that case I would
like to add that flag to NFP PMD and include the hugepage checking along
with changes to how iovas are obtained when mmaping, keeping the iovas
below the dma mask proposed.


>
>
>>
>>       Since 18.05, those iovas can, and usually are, higher than 1TB, as
>>     they
>>
>>         are based on 64 bits address space addresses, and by default the
>>         kernel uses a
>>         starting point far higher than 1TB.
>>
>>         This patchset applies to stable 17.11.3 but I will be happy to
>>         submit patches, if
>>         required, for other DPDK stable versions.
>>
>>
>>
>>
>>     --     Thanks,
>>     Anatoly
>>
>>
>>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-27 16:52       ` Alejandro Lucero
@ 2018-06-28  8:54         ` Burakov, Anatoly
  2018-06-28  9:56           ` Alejandro Lucero
  0 siblings, 1 reply; 16+ messages in thread
From: Burakov, Anatoly @ 2018-06-28  8:54 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev, stable

On 27-Jun-18 5:52 PM, Alejandro Lucero wrote:
> 
> 
> On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
> 
> 
> 
>         On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly
>         <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
>         <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>> wrote:
> 
>              On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
> 
>                  This RFC tries to handle devices with addressing
>         limitations.
>                  NFP devices
>                  4000/6000 can just handle addresses with 40 bits implying
>                  problems for handling
>                  physical address when machines have more than 1TB of
>         memory. But
>                  because how
>                  iovas are configured, which can be equivalent to physical
>                  addresses or based on
>                  virtual addresses, this can be a more likely problem.
> 
>                  I tried to solve this some time ago:
> 
>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
> 
>                  It was delayed because there was some changes in
>         progress with
>                  EAL device
>                  handling, and, being honest, I completely forgot about this
>                  until now, when
>                  I have had to work on supporting NFP devices with DPDK and
>                  non-root users.
> 
>                  I was working on a patch for being applied on main DPDK
>         branch
>                  upstream, but
>                  because changes to memory initialization during the
>         last months,
>                  this can not
>                  be backported to stable versions, at least the part
>         where the
>                  hugepages iovas
>                  are checked.
> 
>                  I realize stable versions only allow bug fixing, and this
>                  patchset could
>                  arguably not be considered as so. But without this, it
>         could be,
>                  although
>                  unlikely, a DPDK used in a machine with more than 1TB,
>         and then
>                  NFP using
>                  the wrong DMA host addresses.
> 
>                  Although virtual addresses used as iovas are more
>         dangerous, for
>                  DPDK versions
>                  before 18.05 this is not worse than with physical
>         addresses,
>                  because iovas,
>                  when physical addresses are not available, are based on a
>                  starting address set
>                  to 0x0.
> 
> 
>              You might want to look at the following patch:
> 
>         http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>              <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>
> 
>              Since this patch, IOVA as VA mode uses VA addresses, and
>         that has
>              been backported to earlier releases. I don't think there's
>         any case
>              where we used zero-based addresses any more.
> 
> 
>         But memsegs get the iova based on hugepages physaddr, and for VA
>         mode that is based on 0x0 as starting point.
> 
>         And as far as I know, memsegs iovas are what end up being used
>         for IOMMU mappings and what devices will use.
> 
> 
>     For when physaddrs are available, IOVA as PA mode assigns IOVA
>     addresses to PA, while IOVA as VA mode assigns IOVA addresses to VA
>     (both 18.05+ and pre-18.05 as per above patch, which was applied to
>     pre-18.05 stable releases).
> 
>     When physaddrs aren't available, IOVA as VA mode assigns IOVA
>     addresses to VA, both 18.05+ and pre-18.05, as per above patch.
> 
> 
> This is right.
> 
>     If physaddrs aren't available and IOVA as PA mode is used, then i as
>     far as i can remember, even though technically memsegs get their
>     addresses set to 0x0 onwards, the actual addresses we get in
>     memzones etc. are RTE_BAD_IOVA.
> 
> 
> This is not right. Not sure if this was the intention, but if PA mode 
> and physaddrs not available, this code inside vfio_type1_dma_map:
> 
> if(rte_eal_iova_mode() == RTE_IOVA_VA)
> 
> dma_map.iova = dma_map.vaddr;
> 
> else
> 
> dma_map.iova = ms[i].iova;
> 
> 
> does the IOMMU mapping using the iovas and not the vaddr, with the iovas 
> starting at 0x0.

Yep, you're right, apologies. I confused this with no-huge option.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-28  8:54         ` Burakov, Anatoly
@ 2018-06-28  9:56           ` Alejandro Lucero
  2018-06-28 10:03             ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-28  9:56 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, stable

On Thu, Jun 28, 2018 at 9:54 AM, Burakov, Anatoly <anatoly.burakov@intel.com
> wrote:

> On 27-Jun-18 5:52 PM, Alejandro Lucero wrote:
>
>>
>>
>> On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly <
>> anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>
>>     On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
>>
>>
>>
>>         On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly
>>         <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
>>         <mailto:anatoly.burakov@intel.com
>>
>>         <mailto:anatoly.burakov@intel.com>>> wrote:
>>
>>              On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
>>
>>                  This RFC tries to handle devices with addressing
>>         limitations.
>>                  NFP devices
>>                  4000/6000 can just handle addresses with 40 bits implying
>>                  problems for handling
>>                  physical address when machines have more than 1TB of
>>         memory. But
>>                  because how
>>                  iovas are configured, which can be equivalent to physical
>>                  addresses or based on
>>                  virtual addresses, this can be a more likely problem.
>>
>>                  I tried to solve this some time ago:
>>
>>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>>                         <https://www.mail-archive.com/
>> dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
>>
>>                  It was delayed because there was some changes in
>>         progress with
>>                  EAL device
>>                  handling, and, being honest, I completely forgot about
>> this
>>                  until now, when
>>                  I have had to work on supporting NFP devices with DPDK
>> and
>>                  non-root users.
>>
>>                  I was working on a patch for being applied on main DPDK
>>         branch
>>                  upstream, but
>>                  because changes to memory initialization during the
>>         last months,
>>                  this can not
>>                  be backported to stable versions, at least the part
>>         where the
>>                  hugepages iovas
>>                  are checked.
>>
>>                  I realize stable versions only allow bug fixing, and this
>>                  patchset could
>>                  arguably not be considered as so. But without this, it
>>         could be,
>>                  although
>>                  unlikely, a DPDK used in a machine with more than 1TB,
>>         and then
>>                  NFP using
>>                  the wrong DMA host addresses.
>>
>>                  Although virtual addresses used as iovas are more
>>         dangerous, for
>>                  DPDK versions
>>                  before 18.05 this is not worse than with physical
>>         addresses,
>>                  because iovas,
>>                  when physical addresses are not available, are based on a
>>                  starting address set
>>                  to 0x0.
>>
>>
>>              You might want to look at the following patch:
>>
>>         http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>
>>              <http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>>
>>
>>              Since this patch, IOVA as VA mode uses VA addresses, and
>>         that has
>>              been backported to earlier releases. I don't think there's
>>         any case
>>              where we used zero-based addresses any more.
>>
>>
>>         But memsegs get the iova based on hugepages physaddr, and for VA
>>         mode that is based on 0x0 as starting point.
>>
>>         And as far as I know, memsegs iovas are what end up being used
>>         for IOMMU mappings and what devices will use.
>>
>>
>>     For when physaddrs are available, IOVA as PA mode assigns IOVA
>>     addresses to PA, while IOVA as VA mode assigns IOVA addresses to VA
>>     (both 18.05+ and pre-18.05 as per above patch, which was applied to
>>     pre-18.05 stable releases).
>>
>>     When physaddrs aren't available, IOVA as VA mode assigns IOVA
>>     addresses to VA, both 18.05+ and pre-18.05, as per above patch.
>>
>>
>> This is right.
>>
>>     If physaddrs aren't available and IOVA as PA mode is used, then i as
>>     far as i can remember, even though technically memsegs get their
>>     addresses set to 0x0 onwards, the actual addresses we get in
>>     memzones etc. are RTE_BAD_IOVA.
>>
>>
>> This is not right. Not sure if this was the intention, but if PA mode and
>> physaddrs not available, this code inside vfio_type1_dma_map:
>>
>> if(rte_eal_iova_mode() == RTE_IOVA_VA)
>>
>> dma_map.iova = dma_map.vaddr;
>>
>> else
>>
>> dma_map.iova = ms[i].iova;
>>
>>
>> does the IOMMU mapping using the iovas and not the vaddr, with the iovas
>> starting at 0x0.
>>
>
> Yep, you're right, apologies. I confused this with no-huge option.


So, what do you think about the patchset? Could it be this applied to
stable versions?

I'll send a patch for current 18.05 code which will have the dma mask and
the hugepage check, along with changes for doing the mmaps below the dma
mask limit.


>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-28  9:56           ` Alejandro Lucero
@ 2018-06-28 10:03             ` Burakov, Anatoly
  2018-06-28 10:27               ` Alejandro Lucero
  0 siblings, 1 reply; 16+ messages in thread
From: Burakov, Anatoly @ 2018-06-28 10:03 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev, stable

On 28-Jun-18 10:56 AM, Alejandro Lucero wrote:
> 
> 
> On Thu, Jun 28, 2018 at 9:54 AM, Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 27-Jun-18 5:52 PM, Alejandro Lucero wrote:
> 
> 
> 
>         On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly
>         <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
>         <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>> wrote:
> 
>              On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
> 
> 
> 
>                  On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly
>                  <anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>         <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
> 
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>>> wrote:
> 
>                       On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
> 
>                           This RFC tries to handle devices with addressing
>                  limitations.
>                           NFP devices
>                           4000/6000 can just handle addresses with 40
>         bits implying
>                           problems for handling
>                           physical address when machines have more than
>         1TB of
>                  memory. But
>                           because how
>                           iovas are configured, which can be equivalent
>         to physical
>                           addresses or based on
>                           virtual addresses, this can be a more likely
>         problem.
> 
>                           I tried to solve this some time ago:
> 
>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
>                                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>>
> 
>                           It was delayed because there was some changes in
>                  progress with
>                           EAL device
>                           handling, and, being honest, I completely
>         forgot about this
>                           until now, when
>                           I have had to work on supporting NFP devices
>         with DPDK and
>                           non-root users.
> 
>                           I was working on a patch for being applied on
>         main DPDK
>                  branch
>                           upstream, but
>                           because changes to memory initialization
>         during the
>                  last months,
>                           this can not
>                           be backported to stable versions, at least the
>         part
>                  where the
>                           hugepages iovas
>                           are checked.
> 
>                           I realize stable versions only allow bug
>         fixing, and this
>                           patchset could
>                           arguably not be considered as so. But without
>         this, it
>                  could be,
>                           although
>                           unlikely, a DPDK used in a machine with more
>         than 1TB,
>                  and then
>                           NFP using
>                           the wrong DMA host addresses.
> 
>                           Although virtual addresses used as iovas are more
>                  dangerous, for
>                           DPDK versions
>                           before 18.05 this is not worse than with physical
>                  addresses,
>                           because iovas,
>                           when physical addresses are not available, are
>         based on a
>                           starting address set
>                           to 0x0.
> 
> 
>                       You might want to look at the following patch:
> 
>         http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>
>                       <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>>
> 
>                       Since this patch, IOVA as VA mode uses VA
>         addresses, and
>                  that has
>                       been backported to earlier releases. I don't think
>         there's
>                  any case
>                       where we used zero-based addresses any more.
> 
> 
>                  But memsegs get the iova based on hugepages physaddr,
>         and for VA
>                  mode that is based on 0x0 as starting point.
> 
>                  And as far as I know, memsegs iovas are what end up
>         being used
>                  for IOMMU mappings and what devices will use.
> 
> 
>              For when physaddrs are available, IOVA as PA mode assigns IOVA
>              addresses to PA, while IOVA as VA mode assigns IOVA
>         addresses to VA
>              (both 18.05+ and pre-18.05 as per above patch, which was
>         applied to
>              pre-18.05 stable releases).
> 
>              When physaddrs aren't available, IOVA as VA mode assigns IOVA
>              addresses to VA, both 18.05+ and pre-18.05, as per above patch.
> 
> 
>         This is right.
> 
>              If physaddrs aren't available and IOVA as PA mode is used,
>         then i as
>              far as i can remember, even though technically memsegs get
>         their
>              addresses set to 0x0 onwards, the actual addresses we get in
>              memzones etc. are RTE_BAD_IOVA.
> 
> 
>         This is not right. Not sure if this was the intention, but if PA
>         mode and physaddrs not available, this code inside
>         vfio_type1_dma_map:
> 
>         if(rte_eal_iova_mode() == RTE_IOVA_VA)
> 
>         dma_map.iova = dma_map.vaddr;
> 
>         else
> 
>         dma_map.iova = ms[i].iova;
> 
> 
>         does the IOMMU mapping using the iovas and not the vaddr, with
>         the iovas starting at 0x0.
> 
> 
>     Yep, you're right, apologies. I confused this with no-huge option.
> 
> 
> So, what do you think about the patchset? Could it be this applied to 
> stable versions?
> 
> I'll send a patch for current 18.05 code which will have the dma mask 
> and the hugepage check, along with changes for doing the mmaps below the 
> dma mask limit.

I've looked through the code, it looks OK to me (bar some things like 
missing .map file additions and a gratuitous rte_panic :) ).

There was a patch/discussion not too long ago about DMA masks for some 
IOMMU's - perhaps we can also extend this approach to that?

https://patches.dpdk.org/patch/33192/

> 
> 
> 
>     -- 
>     Thanks,
>     Anatoly
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-28 10:03             ` Burakov, Anatoly
@ 2018-06-28 10:27               ` Alejandro Lucero
  2018-06-28 10:30                 ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: Alejandro Lucero @ 2018-06-28 10:27 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev, stable

On Thu, Jun 28, 2018 at 11:03 AM, Burakov, Anatoly <
anatoly.burakov@intel.com> wrote:

> On 28-Jun-18 10:56 AM, Alejandro Lucero wrote:
>
>>
>>
>> On Thu, Jun 28, 2018 at 9:54 AM, Burakov, Anatoly <
>> anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>
>>     On 27-Jun-18 5:52 PM, Alejandro Lucero wrote:
>>
>>
>>
>>         On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly
>>         <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
>>         <mailto:anatoly.burakov@intel.com
>>         <mailto:anatoly.burakov@intel.com>>> wrote:
>>
>>              On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
>>
>>
>>
>>                  On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly
>>                  <anatoly.burakov@intel.com
>>         <mailto:anatoly.burakov@intel.com>
>>         <mailto:anatoly.burakov@intel.com
>>         <mailto:anatoly.burakov@intel.com>>
>>                  <mailto:anatoly.burakov@intel.com
>>         <mailto:anatoly.burakov@intel.com>
>>
>>                  <mailto:anatoly.burakov@intel.com
>>         <mailto:anatoly.burakov@intel.com>>>> wrote:
>>
>>                       On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
>>
>>                           This RFC tries to handle devices with addressing
>>                  limitations.
>>                           NFP devices
>>                           4000/6000 can just handle addresses with 40
>>         bits implying
>>                           problems for handling
>>                           physical address when machines have more than
>>         1TB of
>>                  memory. But
>>                           because how
>>                           iovas are configured, which can be equivalent
>>         to physical
>>                           addresses or based on
>>                           virtual addresses, this can be a more likely
>>         problem.
>>
>>                           I tried to solve this some time ago:
>>
>>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>>                         <https://www.mail-archive.com/
>> dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
>>                                         <https://www.mail-archive.com/
>> dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>>                         <https://www.mail-archive.com/
>> dev@dpdk.org/msg45214.html
>>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>>
>>
>>                           It was delayed because there was some changes in
>>                  progress with
>>                           EAL device
>>                           handling, and, being honest, I completely
>>         forgot about this
>>                           until now, when
>>                           I have had to work on supporting NFP devices
>>         with DPDK and
>>                           non-root users.
>>
>>                           I was working on a patch for being applied on
>>         main DPDK
>>                  branch
>>                           upstream, but
>>                           because changes to memory initialization
>>         during the
>>                  last months,
>>                           this can not
>>                           be backported to stable versions, at least the
>>         part
>>                  where the
>>                           hugepages iovas
>>                           are checked.
>>
>>                           I realize stable versions only allow bug
>>         fixing, and this
>>                           patchset could
>>                           arguably not be considered as so. But without
>>         this, it
>>                  could be,
>>                           although
>>                           unlikely, a DPDK used in a machine with more
>>         than 1TB,
>>                  and then
>>                           NFP using
>>                           the wrong DMA host addresses.
>>
>>                           Although virtual addresses used as iovas are
>> more
>>                  dangerous, for
>>                           DPDK versions
>>                           before 18.05 this is not worse than with
>> physical
>>                  addresses,
>>                           because iovas,
>>                           when physical addresses are not available, are
>>         based on a
>>                           starting address set
>>                           to 0x0.
>>
>>
>>                       You might want to look at the following patch:
>>
>>         http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>
>>                  <http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>>
>>                       <http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>
>>                  <http://patches.dpdk.org/patch/37149/
>>         <http://patches.dpdk.org/patch/37149/>>>
>>
>>                       Since this patch, IOVA as VA mode uses VA
>>         addresses, and
>>                  that has
>>                       been backported to earlier releases. I don't think
>>         there's
>>                  any case
>>                       where we used zero-based addresses any more.
>>
>>
>>                  But memsegs get the iova based on hugepages physaddr,
>>         and for VA
>>                  mode that is based on 0x0 as starting point.
>>
>>                  And as far as I know, memsegs iovas are what end up
>>         being used
>>                  for IOMMU mappings and what devices will use.
>>
>>
>>              For when physaddrs are available, IOVA as PA mode assigns
>> IOVA
>>              addresses to PA, while IOVA as VA mode assigns IOVA
>>         addresses to VA
>>              (both 18.05+ and pre-18.05 as per above patch, which was
>>         applied to
>>              pre-18.05 stable releases).
>>
>>              When physaddrs aren't available, IOVA as VA mode assigns IOVA
>>              addresses to VA, both 18.05+ and pre-18.05, as per above
>> patch.
>>
>>
>>         This is right.
>>
>>              If physaddrs aren't available and IOVA as PA mode is used,
>>         then i as
>>              far as i can remember, even though technically memsegs get
>>         their
>>              addresses set to 0x0 onwards, the actual addresses we get in
>>              memzones etc. are RTE_BAD_IOVA.
>>
>>
>>         This is not right. Not sure if this was the intention, but if PA
>>         mode and physaddrs not available, this code inside
>>         vfio_type1_dma_map:
>>
>>         if(rte_eal_iova_mode() == RTE_IOVA_VA)
>>
>>         dma_map.iova = dma_map.vaddr;
>>
>>         else
>>
>>         dma_map.iova = ms[i].iova;
>>
>>
>>         does the IOMMU mapping using the iovas and not the vaddr, with
>>         the iovas starting at 0x0.
>>
>>
>>     Yep, you're right, apologies. I confused this with no-huge option.
>>
>>
>> So, what do you think about the patchset? Could it be this applied to
>> stable versions?
>>
>> I'll send a patch for current 18.05 code which will have the dma mask and
>> the hugepage check, along with changes for doing the mmaps below the dma
>> mask limit.
>>
>
> I've looked through the code, it looks OK to me (bar some things like
> missing .map file additions and a gratuitous rte_panic :) ).
>
> There was a patch/discussion not too long ago about DMA masks for some
> IOMMU's - perhaps we can also extend this approach to that?
>
> https://patches.dpdk.org/patch/33192/
>
>
>
I completely missed that patch.

It seems it could also be applied for that case adding a dma mask set if it
is an emulated VT-d with that 39 bits restriction.

I'll take a look at that patch and submit a new patchset including changes
for that case. I did also forget the hotplug case where the hugepage
checking needs to be invoked.

Thanks



>
>>
>>
>>     --     Thanks,
>>     Anatoly
>>
>>
>>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] [RFC] Add support for device dma mask
  2018-06-28 10:27               ` Alejandro Lucero
@ 2018-06-28 10:30                 ` Burakov, Anatoly
  0 siblings, 0 replies; 16+ messages in thread
From: Burakov, Anatoly @ 2018-06-28 10:30 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dev, stable

On 28-Jun-18 11:27 AM, Alejandro Lucero wrote:
> 
> 
> On Thu, Jun 28, 2018 at 11:03 AM, Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 28-Jun-18 10:56 AM, Alejandro Lucero wrote:
> 
> 
> 
>         On Thu, Jun 28, 2018 at 9:54 AM, Burakov, Anatoly
>         <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>
>         <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>> wrote:
> 
>              On 27-Jun-18 5:52 PM, Alejandro Lucero wrote:
> 
> 
> 
>                  On Wed, Jun 27, 2018 at 2:24 PM, Burakov, Anatoly
>                  <anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>         <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>>> wrote:
> 
>                       On 27-Jun-18 11:13 AM, Alejandro Lucero wrote:
> 
> 
> 
>                           On Wed, Jun 27, 2018 at 9:17 AM, Burakov, Anatoly
>                           <anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>>
>                           <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>
> 
>                           <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>
>                  <mailto:anatoly.burakov@intel.com
>         <mailto:anatoly.burakov@intel.com>>>>> wrote:
> 
>                                On 26-Jun-18 6:37 PM, Alejandro Lucero wrote:
> 
>                                    This RFC tries to handle devices with
>         addressing
>                           limitations.
>                                    NFP devices
>                                    4000/6000 can just handle addresses
>         with 40
>                  bits implying
>                                    problems for handling
>                                    physical address when machines have
>         more than
>                  1TB of
>                           memory. But
>                                    because how
>                                    iovas are configured, which can be
>         equivalent
>                  to physical
>                                    addresses or based on
>                                    virtual addresses, this can be a more
>         likely
>                  problem.
> 
>                                    I tried to solve this some time ago:
> 
>         https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
>                                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>>
>                                                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>
>                                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>
>                 
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html
>         <https://www.mail-archive.com/dev@dpdk.org/msg45214.html>>>>
> 
>                                    It was delayed because there was some
>         changes in
>                           progress with
>                                    EAL device
>                                    handling, and, being honest, I completely
>                  forgot about this
>                                    until now, when
>                                    I have had to work on supporting NFP
>         devices
>                  with DPDK and
>                                    non-root users.
> 
>                                    I was working on a patch for being
>         applied on
>                  main DPDK
>                           branch
>                                    upstream, but
>                                    because changes to memory initialization
>                  during the
>                           last months,
>                                    this can not
>                                    be backported to stable versions, at
>         least the
>                  part
>                           where the
>                                    hugepages iovas
>                                    are checked.
> 
>                                    I realize stable versions only allow bug
>                  fixing, and this
>                                    patchset could
>                                    arguably not be considered as so. But
>         without
>                  this, it
>                           could be,
>                                    although
>                                    unlikely, a DPDK used in a machine
>         with more
>                  than 1TB,
>                           and then
>                                    NFP using
>                                    the wrong DMA host addresses.
> 
>                                    Although virtual addresses used as
>         iovas are more
>                           dangerous, for
>                                    DPDK versions
>                                    before 18.05 this is not worse than
>         with physical
>                           addresses,
>                                    because iovas,
>                                    when physical addresses are not
>         available, are
>                  based on a
>                                    starting address set
>                                    to 0x0.
> 
> 
>                                You might want to look at the following
>         patch:
> 
>         http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>
>                           <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>>
>                                <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>
>                           <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>
>                  <http://patches.dpdk.org/patch/37149/
>         <http://patches.dpdk.org/patch/37149/>>>>
> 
>                                Since this patch, IOVA as VA mode uses VA
>                  addresses, and
>                           that has
>                                been backported to earlier releases. I
>         don't think
>                  there's
>                           any case
>                                where we used zero-based addresses any more.
> 
> 
>                           But memsegs get the iova based on hugepages
>         physaddr,
>                  and for VA
>                           mode that is based on 0x0 as starting point.
> 
>                           And as far as I know, memsegs iovas are what
>         end up
>                  being used
>                           for IOMMU mappings and what devices will use.
> 
> 
>                       For when physaddrs are available, IOVA as PA mode
>         assigns IOVA
>                       addresses to PA, while IOVA as VA mode assigns IOVA
>                  addresses to VA
>                       (both 18.05+ and pre-18.05 as per above patch,
>         which was
>                  applied to
>                       pre-18.05 stable releases).
> 
>                       When physaddrs aren't available, IOVA as VA mode
>         assigns IOVA
>                       addresses to VA, both 18.05+ and pre-18.05, as per
>         above patch.
> 
> 
>                  This is right.
> 
>                       If physaddrs aren't available and IOVA as PA mode
>         is used,
>                  then i as
>                       far as i can remember, even though technically
>         memsegs get
>                  their
>                       addresses set to 0x0 onwards, the actual addresses
>         we get in
>                       memzones etc. are RTE_BAD_IOVA.
> 
> 
>                  This is not right. Not sure if this was the intention,
>         but if PA
>                  mode and physaddrs not available, this code inside
>                  vfio_type1_dma_map:
> 
>                  if(rte_eal_iova_mode() == RTE_IOVA_VA)
> 
>                  dma_map.iova = dma_map.vaddr;
> 
>                  else
> 
>                  dma_map.iova = ms[i].iova;
> 
> 
>                  does the IOMMU mapping using the iovas and not the
>         vaddr, with
>                  the iovas starting at 0x0.
> 
> 
>              Yep, you're right, apologies. I confused this with no-huge
>         option.
> 
> 
>         So, what do you think about the patchset? Could it be this
>         applied to stable versions?
> 
>         I'll send a patch for current 18.05 code which will have the dma
>         mask and the hugepage check, along with changes for doing the
>         mmaps below the dma mask limit.
> 
> 
>     I've looked through the code, it looks OK to me (bar some things
>     like missing .map file additions and a gratuitous rte_panic :) ).
> 
>     There was a patch/discussion not too long ago about DMA masks for
>     some IOMMU's - perhaps we can also extend this approach to that?
> 
>     https://patches.dpdk.org/patch/33192/
>     <https://patches.dpdk.org/patch/33192/>
> 
> 
> 
> I completely missed that patch.
> 
> It seems it could also be applied for that case adding a dma mask set if 
> it is an emulated VT-d with that 39 bits restriction.
> 
> I'll take a look at that patch and submit a new patchset including 
> changes for that case. I did also forget the hotplug case where the 
> hugepage checking needs to be invoked.

Great.

Just in case, the original link i provided was to a v2. v3 was accepted:

https://patches.dpdk.org/patch/33650/

Thanks!

> 
> Thanks
> 
> 
> 
> 
>              --     Thanks,
>              Anatoly
> 
> 
> 
> 
>     -- 
>     Thanks,
>     Anatoly
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-06-28 10:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-26 17:37 [dpdk-dev] [RFC] Add support for device dma mask Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 1/6] eal: add internal " Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 2/6] mem: add hugepages check Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 3/6] eal: check hugepages within dma mask range Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 4/6] mem: add function for setting internal dma mask Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 5/6] ethdev: add function for " Alejandro Lucero
2018-06-26 17:37 ` [dpdk-dev] [PATCH 6/6] net/nfp: set " Alejandro Lucero
2018-06-27  8:17 ` [dpdk-dev] [RFC] Add support for device " Burakov, Anatoly
2018-06-27 10:13   ` Alejandro Lucero
2018-06-27 13:24     ` Burakov, Anatoly
2018-06-27 16:52       ` Alejandro Lucero
2018-06-28  8:54         ` Burakov, Anatoly
2018-06-28  9:56           ` Alejandro Lucero
2018-06-28 10:03             ` Burakov, Anatoly
2018-06-28 10:27               ` Alejandro Lucero
2018-06-28 10:30                 ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).