DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC] ethdev: introduce DMA memory mapping for external memory
@ 2018-11-04 12:41 Shahaf Shuler
  2018-11-14 11:19 ` Burakov, Anatoly
  0 siblings, 1 reply; 17+ messages in thread
From: Shahaf Shuler @ 2018-11-04 12:41 UTC (permalink / raw)
  To: dev
  Cc: olgas, yskoh, pawelx.wodkowski, anatoly.burakov, gowrishankar.m,
	ferruh.yigit, thomas, arybchenko, shreyansh.jain

Request for comment on the high level changes present on this patch.

The need to use external memory (memory belong to application and not
part of the DPDK hugepages) is allready present.
Starting from storage apps which prefer to manage their own memory blocks
for efficient use of the storage device. Continue with GPU based
application which strives to achieve zero copy while processing the packet
payload on the GPU core. And finally by vSwitch/vRouter application who
just prefer to have a full control over the memory in use (e.g. VPP).

Recent work[1] in the DPDK enabled the use of external memory, however
it mostly focus on VFIO as the only way to map memory.
While VFIO is common, there are other vendors which use different ways
to map memory (e.g. Mellanox and NXP[2]).

The work in this patch moves the DMA mapping to vendor agnostic APIs
located under ethdev. The choice in ethdev was because memory map should
be associated with a specific port(s). Otherwise the memory is being
mapped multiple times to different frameworks and ends up with memory
being wasted on redundant translation table in the host or in the device.

For example, consider a host with Mellanox and Intel devices. Mapping a
memory without specifying to which port will end up with IOMMU
registration and Verbs (Mellanox DMA map) registration.
Another example can be two Mellanox devices on the same host. The memory
will be mapped for both, even though application will use mempool per
device.

To use the suggested APIs the application will allocate a memory block
and will call rte_eth_dma_map. It will map it to every port that needs
DMA access to this memory.
Later on the application could use this memory to populate a mempool or
to attach mbuf with external buffer.
When the memory should no longer be used by the device the application
will call rte_eth_dma_unmap from every port it did registration to.

The Drivers will implement the DMA map/unmap, and it is very likely they
will use the help of the existing VFIO mapping.

Support for hotplug/unplug of device is out of the scope for this patch,
however can be implemented in the same way it is done on VFIO.

Cc: pawelx.wodkowski@intel.com
Cc: anatoly.burakov@intel.com
Cc: gowrishankar.m@linux.vnet.ibm.com
Cc: ferruh.yigit@intel.com
Cc: thomas@monjalon.net
Cc: arybchenko@solarflare.com
Cc: shreyansh.jain@nxp.com

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>

[1]
commit 73a639085938 ("vfio: allow to map other memory regions")
[2]
http://mails.dpdk.org/archives/dev/2018-September/111978.html
---
 lib/librte_eal/bsdapp/eal/eal.c          | 14 ----------
 lib/librte_eal/common/include/rte_vfio.h | 44 --------------------------------
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 22 ----------------
 lib/librte_ethdev/rte_ethdev.c           | 38 +++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev.h           | 40 +++++++++++++++++++++++++++++
 lib/librte_ethdev/rte_ethdev_core.h      | 14 ++++++++++
 6 files changed, 92 insertions(+), 80 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 508cbc46fd..44667e8446 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -885,20 +885,6 @@ int rte_vfio_clear_group(__rte_unused int vfio_group_fd)
 }
 
 int
-rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova,
-		  __rte_unused uint64_t len)
-{
-	return -1;
-}
-
-int
-rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova,
-		    __rte_unused uint64_t len)
-{
-	return -1;
-}
-
-int
 rte_vfio_get_group_num(__rte_unused const char *sysfs_base,
 		       __rte_unused const char *dev_addr,
 		       __rte_unused int *iommu_group_num)
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index cae96fab90..d8d966dd33 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -188,50 +188,6 @@ int
 rte_vfio_clear_group(int vfio_group_fd);
 
 /**
- * Map memory region for use with VFIO.
- *
- * @note Require at least one device to be attached at the time of
- *       mapping. DMA maps done via this API will only apply to default
- *       container and will not apply to any of the containers created
- *       via rte_vfio_container_create().
- *
- * @param vaddr
- *   Starting virtual address of memory to be mapped.
- *
- * @param iova
- *   Starting IOVA address of memory to be mapped.
- *
- * @param len
- *   Length of memory segment being mapped.
- *
- * @return
- *   0 if success.
- *   -1 on error.
- */
-int
-rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
-
-
-/**
- * Unmap memory region from VFIO.
- *
- * @param vaddr
- *   Starting virtual address of memory to be unmapped.
- *
- * @param iova
- *   Starting IOVA address of memory to be unmapped.
- *
- * @param len
- *   Length of memory segment being unmapped.
- *
- * @return
- *   0 if success.
- *   -1 on error.
- */
-
-int
-rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
-/**
  * Parse IOMMU group number for a device
  *
  * This function is only relevant to linux and will return
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 0516b1597b..839dce243f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1641,28 +1641,6 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
 }
 
 int
-rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len)
-{
-	if (len == 0) {
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	return container_dma_map(default_vfio_cfg, vaddr, iova, len);
-}
-
-int
-rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len)
-{
-	if (len == 0) {
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	return container_dma_unmap(default_vfio_cfg, vaddr, iova, len);
-}
-
-int
 rte_vfio_noiommu_is_enabled(void)
 {
 	int fd;
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 36e5389b3a..acc9d819a1 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -4447,6 +4447,44 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
 	return result;
 }
 
+int __rte_experimental
+rte_eth_dma_map(uint16_t port_id, uint64_t vaddr, uint64_t iova,
+		uint64_t len)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dma_map, -ENOTSUP);
+
+	return (*dev->dev_ops->dma_map)(dev, vaddr, iova, len);
+}
+
+int __rte_experimental
+rte_eth_dma_unmap(uint16_t port_id, uint64_t vaddr, uint64_t iova,
+		  uint64_t len)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+	dev = &rte_eth_devices[port_id];
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dma_unmap, -ENOTSUP);
+
+	return (*dev->dev_ops->dma_unmap)(dev, vaddr, iova, len);
+}
+
 RTE_INIT(ethdev_init_log)
 {
 	rte_eth_dev_logtype = rte_log_register("lib.ethdev");
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 769a694309..ad695ab70b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -4368,6 +4368,46 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id,
 	return rte_eth_tx_buffer_flush(port_id, queue_id, buffer);
 }
 
+/**
+ * Map memory region for use of an ethdev.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param vaddr
+ *   Starting virtual address of memory to be mapped.
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ * @param len
+ *   Length of memory segment being mapped.
+ *
+ * @return
+ *   0 if success.
+ *   -1 on error.
+ */
+int
+rte_eth_dma_map(uint16_t port_id, uint64_t vaddr, uint64_t iova, uint64_t len);
+
+
+/**
+ * Unmap memory region from an ethdev.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param vaddr
+ *   Starting virtual address of memory to be unmapped.
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ * @param len
+ *   Length of memory segment being unmapped.
+ *
+ * @return
+ *   0 if success.
+ *   -1 on error.
+ */
+int
+rte_eth_dma_unmap(uint16_t port_id, uint64_t vaddr, uint64_t iova,
+		  uint64_t len);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 8f03f83f62..41568ca57f 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -377,6 +377,14 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev,
 						const char *pool);
 /**< @internal Test if a port supports specific mempool ops */
 
+typedef int (*eth_dma_map_t)(struct rte_eth_dev *dev, uint64_t vaddr,
+			     uint64_t iova, uint64_t len);
+/**< @internal Register memory for DMA usage by the device */
+
+typedef int (*eth_dma_unmap_t)(struct rte_eth_dev *dev, uint64_t vaddr,
+			       uint64_t iova, uint64_t len);
+/**< @internal Un-register memory from DMA usage by the device */
+
 /**
  * @internal A structure containing the functions exported by an Ethernet driver.
  */
@@ -509,6 +517,12 @@ struct eth_dev_ops {
 
 	eth_pool_ops_supported_t pool_ops_supported;
 	/**< Test if a port supports specific mempool ops */
+
+	eth_dma_map_t dma_map;
+	/**< Register memory for DMA usage by the device */
+
+	eth_dma_map_t dma_unmap;
+	/**< Un-register memory from DMA usage by the device */
 };
 
 /**
-- 
2.12.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-01-16 11:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-04 12:41 [dpdk-dev] [RFC] ethdev: introduce DMA memory mapping for external memory Shahaf Shuler
2018-11-14 11:19 ` Burakov, Anatoly
2018-11-14 14:53   ` Shahaf Shuler
2018-11-14 17:06     ` Burakov, Anatoly
2018-11-15  9:46       ` Shahaf Shuler
2018-11-15 10:59         ` Burakov, Anatoly
2018-11-19 11:20           ` Shahaf Shuler
2018-11-19 17:18             ` Burakov, Anatoly
     [not found]               ` <DB7PR05MB442643DFD33B71797CD34B5EC3D90@DB7PR05MB4426.eurprd05.prod.outlook.com>
2018-11-20 10:55                 ` Burakov, Anatoly
2018-11-22 10:06                   ` Shahaf Shuler
2018-11-22 10:41                     ` Burakov, Anatoly
2018-11-22 11:31                       ` Shahaf Shuler
2018-11-22 11:34                         ` Burakov, Anatoly
2019-01-14  6:12                         ` Shahaf Shuler
2019-01-15 12:07                           ` Burakov, Anatoly
2019-01-16 11:04                             ` Shahaf Shuler
2018-11-19 17:04           ` Stephen Hemminger

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git