From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id F0BB3201 for ; Sun, 4 Nov 2018 13:41:56 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 4 Nov 2018 14:47:17 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id wA4CftCq019621; Sun, 4 Nov 2018 14:41:55 +0200 From: Shahaf Shuler To: dev@dpdk.org Cc: olgas@mellanox.com, yskoh@mellanox.com, pawelx.wodkowski@intel.com, anatoly.burakov@intel.com, gowrishankar.m@linux.vnet.ibm.com, ferruh.yigit@intel.com, thomas@monjalon.net, arybchenko@solarflare.com, shreyansh.jain@nxp.com Date: Sun, 4 Nov 2018 14:41:38 +0200 Message-Id: X-Mailer: git-send-email 2.12.0 Subject: [dpdk-dev] [RFC] ethdev: introduce DMA memory mapping for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Nov 2018 12:41:57 -0000 Request for comment on the high level changes present on this patch. The need to use external memory (memory belong to application and not part of the DPDK hugepages) is allready present. Starting from storage apps which prefer to manage their own memory blocks for efficient use of the storage device. Continue with GPU based application which strives to achieve zero copy while processing the packet payload on the GPU core. And finally by vSwitch/vRouter application who just prefer to have a full control over the memory in use (e.g. VPP). Recent work[1] in the DPDK enabled the use of external memory, however it mostly focus on VFIO as the only way to map memory. While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP[2]). The work in this patch moves the DMA mapping to vendor agnostic APIs located under ethdev. The choice in ethdev was because memory map should be associated with a specific port(s). Otherwise the memory is being mapped multiple times to different frameworks and ends up with memory being wasted on redundant translation table in the host or in the device. For example, consider a host with Mellanox and Intel devices. Mapping a memory without specifying to which port will end up with IOMMU registration and Verbs (Mellanox DMA map) registration. Another example can be two Mellanox devices on the same host. The memory will be mapped for both, even though application will use mempool per device. To use the suggested APIs the application will allocate a memory block and will call rte_eth_dma_map. It will map it to every port that needs DMA access to this memory. Later on the application could use this memory to populate a mempool or to attach mbuf with external buffer. When the memory should no longer be used by the device the application will call rte_eth_dma_unmap from every port it did registration to. The Drivers will implement the DMA map/unmap, and it is very likely they will use the help of the existing VFIO mapping. Support for hotplug/unplug of device is out of the scope for this patch, however can be implemented in the same way it is done on VFIO. Cc: pawelx.wodkowski@intel.com Cc: anatoly.burakov@intel.com Cc: gowrishankar.m@linux.vnet.ibm.com Cc: ferruh.yigit@intel.com Cc: thomas@monjalon.net Cc: arybchenko@solarflare.com Cc: shreyansh.jain@nxp.com Signed-off-by: Shahaf Shuler [1] commit 73a639085938 ("vfio: allow to map other memory regions") [2] http://mails.dpdk.org/archives/dev/2018-September/111978.html --- lib/librte_eal/bsdapp/eal/eal.c | 14 ---------- lib/librte_eal/common/include/rte_vfio.h | 44 -------------------------------- lib/librte_eal/linuxapp/eal/eal_vfio.c | 22 ---------------- lib/librte_ethdev/rte_ethdev.c | 38 +++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 40 +++++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_core.h | 14 ++++++++++ 6 files changed, 92 insertions(+), 80 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 508cbc46fd..44667e8446 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -885,20 +885,6 @@ int rte_vfio_clear_group(__rte_unused int vfio_group_fd) } int -rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova, - __rte_unused uint64_t len) -{ - return -1; -} - -int -rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova, - __rte_unused uint64_t len) -{ - return -1; -} - -int rte_vfio_get_group_num(__rte_unused const char *sysfs_base, __rte_unused const char *dev_addr, __rte_unused int *iommu_group_num) diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h index cae96fab90..d8d966dd33 100644 --- a/lib/librte_eal/common/include/rte_vfio.h +++ b/lib/librte_eal/common/include/rte_vfio.h @@ -188,50 +188,6 @@ int rte_vfio_clear_group(int vfio_group_fd); /** - * Map memory region for use with VFIO. - * - * @note Require at least one device to be attached at the time of - * mapping. DMA maps done via this API will only apply to default - * container and will not apply to any of the containers created - * via rte_vfio_container_create(). - * - * @param vaddr - * Starting virtual address of memory to be mapped. - * - * @param iova - * Starting IOVA address of memory to be mapped. - * - * @param len - * Length of memory segment being mapped. - * - * @return - * 0 if success. - * -1 on error. - */ -int -rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len); - - -/** - * Unmap memory region from VFIO. - * - * @param vaddr - * Starting virtual address of memory to be unmapped. - * - * @param iova - * Starting IOVA address of memory to be unmapped. - * - * @param len - * Length of memory segment being unmapped. - * - * @return - * 0 if success. - * -1 on error. - */ - -int -rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len); -/** * Parse IOMMU group number for a device * * This function is only relevant to linux and will return diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 0516b1597b..839dce243f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1641,28 +1641,6 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, } int -rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len) -{ - if (len == 0) { - rte_errno = EINVAL; - return -1; - } - - return container_dma_map(default_vfio_cfg, vaddr, iova, len); -} - -int -rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len) -{ - if (len == 0) { - rte_errno = EINVAL; - return -1; - } - - return container_dma_unmap(default_vfio_cfg, vaddr, iova, len); -} - -int rte_vfio_noiommu_is_enabled(void) { int fd; diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 36e5389b3a..acc9d819a1 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -4447,6 +4447,44 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da) return result; } +int __rte_experimental +rte_eth_dma_map(uint16_t port_id, uint64_t vaddr, uint64_t iova, + uint64_t len) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + if (len == 0) { + rte_errno = EINVAL; + return -1; + } + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dma_map, -ENOTSUP); + + return (*dev->dev_ops->dma_map)(dev, vaddr, iova, len); +} + +int __rte_experimental +rte_eth_dma_unmap(uint16_t port_id, uint64_t vaddr, uint64_t iova, + uint64_t len) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + if (len == 0) { + rte_errno = EINVAL; + return -1; + } + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dma_unmap, -ENOTSUP); + + return (*dev->dev_ops->dma_unmap)(dev, vaddr, iova, len); +} + RTE_INIT(ethdev_init_log) { rte_eth_dev_logtype = rte_log_register("lib.ethdev"); diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 769a694309..ad695ab70b 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -4368,6 +4368,46 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id, return rte_eth_tx_buffer_flush(port_id, queue_id, buffer); } +/** + * Map memory region for use of an ethdev. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param vaddr + * Starting virtual address of memory to be mapped. + * @param iova + * Starting IOVA address of memory to be mapped. + * @param len + * Length of memory segment being mapped. + * + * @return + * 0 if success. + * -1 on error. + */ +int +rte_eth_dma_map(uint16_t port_id, uint64_t vaddr, uint64_t iova, uint64_t len); + + +/** + * Unmap memory region from an ethdev. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param vaddr + * Starting virtual address of memory to be unmapped. + * @param iova + * Starting IOVA address of memory to be unmapped. + * @param len + * Length of memory segment being unmapped. + * + * @return + * 0 if success. + * -1 on error. + */ +int +rte_eth_dma_unmap(uint16_t port_id, uint64_t vaddr, uint64_t iova, + uint64_t len); + #ifdef __cplusplus } #endif diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h index 8f03f83f62..41568ca57f 100644 --- a/lib/librte_ethdev/rte_ethdev_core.h +++ b/lib/librte_ethdev/rte_ethdev_core.h @@ -377,6 +377,14 @@ typedef int (*eth_pool_ops_supported_t)(struct rte_eth_dev *dev, const char *pool); /**< @internal Test if a port supports specific mempool ops */ +typedef int (*eth_dma_map_t)(struct rte_eth_dev *dev, uint64_t vaddr, + uint64_t iova, uint64_t len); +/**< @internal Register memory for DMA usage by the device */ + +typedef int (*eth_dma_unmap_t)(struct rte_eth_dev *dev, uint64_t vaddr, + uint64_t iova, uint64_t len); +/**< @internal Un-register memory from DMA usage by the device */ + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -509,6 +517,12 @@ struct eth_dev_ops { eth_pool_ops_supported_t pool_ops_supported; /**< Test if a port supports specific mempool ops */ + + eth_dma_map_t dma_map; + /**< Register memory for DMA usage by the device */ + + eth_dma_map_t dma_unmap; + /**< Un-register memory from DMA usage by the device */ }; /** -- 2.12.0