From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 6C61D5F62 for ; Wed, 7 Mar 2018 17:57:18 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Mar 2018 08:57:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,436,1515484800"; d="scan'208";a="26020375" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga002.fm.intel.com with ESMTP; 07 Mar 2018 08:57:13 -0800 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w27GvDO6032416; Wed, 7 Mar 2018 16:57:13 GMT Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w27GvCrT006788; Wed, 7 Mar 2018 16:57:12 GMT Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w27GvC8b006784; Wed, 7 Mar 2018 16:57:12 GMT From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, olivier.matz@6wind.com, Pawel Wodkowski Date: Wed, 7 Mar 2018 16:56:52 +0000 Message-Id: X-Mailer: git-send-email 1.7.0.7 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v2 24/41] vfio: allow to map other memory regions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2018 16:57:19 -0000 Currently it is not possible to use memory that is not owned by DPDK to perform DMA. This scenarion might be used in vhost applications (like SPDK) where guest send its own memory table. To fill this gap provide API to allow registering arbitrary address in VFIO container. Signed-off-by: Pawel Wodkowski Signed-off-by: Anatoly Burakov --- lib/librte_eal/bsdapp/eal/eal.c | 16 ++++ lib/librte_eal/common/include/rte_vfio.h | 39 ++++++++ lib/librte_eal/linuxapp/eal/eal_vfio.c | 153 ++++++++++++++++++++++++++----- lib/librte_eal/linuxapp/eal/eal_vfio.h | 11 +++ 4 files changed, 196 insertions(+), 23 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 3b06e21..5a7f436 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -755,6 +755,8 @@ int rte_vfio_enable(const char *modname); int rte_vfio_is_enabled(const char *modname); int rte_vfio_noiommu_is_enabled(void); int rte_vfio_clear_group(int vfio_group_fd); +int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len); +int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len); int rte_vfio_setup_device(__rte_unused const char *sysfs_base, __rte_unused const char *dev_addr, @@ -790,3 +792,17 @@ int rte_vfio_clear_group(__rte_unused int vfio_group_fd) { return 0; } + +int +rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova, + __rte_unused uint64_t len) +{ + return -1; +} + +int +rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova, + __rte_unused uint64_t len) +{ + return -1; +} diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h index e981a62..093c309 100644 --- a/lib/librte_eal/common/include/rte_vfio.h +++ b/lib/librte_eal/common/include/rte_vfio.h @@ -123,6 +123,45 @@ int rte_vfio_noiommu_is_enabled(void); int rte_vfio_clear_group(int vfio_group_fd); +/** + * Map memory region for use with VFIO. + * + * @param vaddr + * Starting virtual address of memory to be mapped. + * + * @param iova + * Starting IOVA address of memory to be mapped. + * + * @param len + * Length of memory segment being mapped. + * + * @return + * 0 if success. + * -1 on error. + */ +int +rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len); + + +/** + * Unmap memory region from VFIO. + * + * @param vaddr + * Starting virtual address of memory to be unmapped. + * + * @param iova + * Starting IOVA address of memory to be unmapped. + * + * @param len + * Length of memory segment being unmapped. + * + * @return + * 0 if success. + * -1 on error. + */ +int +rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len); + #endif /* VFIO_PRESENT */ #endif /* _RTE_VFIO_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 5192763..8fe8984 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -22,17 +22,35 @@ static struct vfio_config vfio_cfg; static int vfio_type1_dma_map(int); +static int vfio_type1_dma_mem_map(int, uint64_t, uint64_t, uint64_t, int); static int vfio_spapr_dma_map(int); static int vfio_noiommu_dma_map(int); +static int vfio_noiommu_dma_mem_map(int, uint64_t, uint64_t, uint64_t, int); /* IOMMU types we support */ static const struct vfio_iommu_type iommu_types[] = { /* x86 IOMMU, otherwise known as type 1 */ - { RTE_VFIO_TYPE1, "Type 1", &vfio_type1_dma_map}, + { + .type_id = RTE_VFIO_TYPE1, + .name = "Type 1", + .dma_map_func = &vfio_type1_dma_map, + .dma_user_map_func = &vfio_type1_dma_mem_map + }, /* ppc64 IOMMU, otherwise known as spapr */ - { RTE_VFIO_SPAPR, "sPAPR", &vfio_spapr_dma_map}, + { + .type_id = RTE_VFIO_SPAPR, + .name = "sPAPR", + .dma_map_func = &vfio_spapr_dma_map, + .dma_user_map_func = NULL + // TODO: work with PPC64 people on enabling this, window size! + }, /* IOMMU-less mode */ - { RTE_VFIO_NOIOMMU, "No-IOMMU", &vfio_noiommu_dma_map}, + { + .type_id = RTE_VFIO_NOIOMMU, + .name = "No-IOMMU", + .dma_map_func = &vfio_noiommu_dma_map, + .dma_user_map_func = &vfio_noiommu_dma_mem_map + }, }; int @@ -333,9 +351,10 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr, */ if (internal_config.process_type == RTE_PROC_PRIMARY && vfio_cfg.vfio_active_groups == 1) { + const struct vfio_iommu_type *t; + /* select an IOMMU type which we will be using */ - const struct vfio_iommu_type *t = - vfio_set_iommu_type(vfio_cfg.vfio_container_fd); + t = vfio_set_iommu_type(vfio_cfg.vfio_container_fd); if (!t) { RTE_LOG(ERR, EAL, " %s failed to select IOMMU type\n", @@ -353,6 +372,8 @@ rte_vfio_setup_device(const char *sysfs_base, const char *dev_addr, rte_vfio_clear_group(vfio_group_fd); return -1; } + + vfio_cfg.vfio_iommu_type = t; } } @@ -665,13 +686,54 @@ vfio_get_group_no(const char *sysfs_base, } static int +vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, + uint64_t len, int do_map) +{ + struct vfio_iommu_type1_dma_map dma_map; + struct vfio_iommu_type1_dma_unmap dma_unmap; + int ret; + + if (do_map != 0) { + memset(&dma_map, 0, sizeof(dma_map)); + dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); + dma_map.vaddr = vaddr; + dma_map.size = len; + dma_map.iova = iova; + dma_map.flags = VFIO_DMA_MAP_FLAG_READ | + VFIO_DMA_MAP_FLAG_WRITE; + + ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); + if (ret) { + RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n", + errno, strerror(errno)); + return -1; + } + + } else { + memset(&dma_unmap, 0, sizeof(dma_unmap)); + dma_unmap.argsz = sizeof(struct vfio_iommu_type1_dma_unmap); + dma_unmap.size = len; + dma_unmap.iova = iova; + + ret = ioctl(vfio_container_fd, VFIO_IOMMU_UNMAP_DMA, + &dma_unmap); + if (ret) { + RTE_LOG(ERR, EAL, " cannot clear DMA remapping, error %i (%s)\n", + errno, strerror(errno)); + return -1; + } + } + + return 0; +} + +static int vfio_type1_dma_map(int vfio_container_fd) { - int i, ret; + int i; /* map all DPDK segments for DMA. use 1:1 PA to IOVA mapping */ for (i = 0; i < RTE_MAX_MEMSEG_LISTS; i++) { - struct vfio_iommu_type1_dma_map dma_map; struct rte_memseg_list *msl; struct rte_fbarray *arr; int ms_idx, next_idx; @@ -697,23 +759,9 @@ vfio_type1_dma_map(int vfio_container_fd) len = ms->hugepage_sz; hw_addr = ms->iova; - memset(&dma_map, 0, sizeof(dma_map)); - dma_map.argsz = sizeof(struct vfio_iommu_type1_dma_map); - dma_map.vaddr = addr; - dma_map.size = len; - dma_map.iova = hw_addr; - dma_map.flags = VFIO_DMA_MAP_FLAG_READ | - VFIO_DMA_MAP_FLAG_WRITE; - - ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, - &dma_map); - - if (ret) { - RTE_LOG(ERR, EAL, " cannot set up DMA remapping, " - "error %i (%s)\n", errno, - strerror(errno)); + if (vfio_type1_dma_mem_map(vfio_container_fd, addr, + hw_addr, len, 1)) return -1; - } } } @@ -865,6 +913,49 @@ vfio_noiommu_dma_map(int __rte_unused vfio_container_fd) return 0; } +static int +vfio_noiommu_dma_mem_map(int __rte_unused vfio_container_fd, + uint64_t __rte_unused vaddr, + uint64_t __rte_unused iova, uint64_t __rte_unused len, + int __rte_unused do_map) +{ + /* No-IOMMU mode does not need DMA mapping */ + return 0; +} + +static int +vfio_dma_mem_map(uint64_t vaddr, uint64_t iova, uint64_t len, int do_map) +{ + const struct vfio_iommu_type *t = vfio_cfg.vfio_iommu_type; + + if (!t) { + RTE_LOG(ERR, EAL, " VFIO support not initialized\n"); + return -1; + } + + if (!t->dma_user_map_func) { + RTE_LOG(ERR, EAL, + " VFIO custom DMA region maping not supported by IOMMU %s\n", + t->name); + return -1; + } + + return t->dma_user_map_func(vfio_cfg.vfio_container_fd, vaddr, iova, + len, do_map); +} + +int +rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len) +{ + return vfio_dma_mem_map(vaddr, iova, len, 1); +} + +int +rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len) +{ + return vfio_dma_mem_map(vaddr, iova, len, 0); +} + int rte_vfio_noiommu_is_enabled(void) { @@ -897,4 +988,20 @@ rte_vfio_noiommu_is_enabled(void) return c == 'Y'; } +#else + +int +rte_vfio_dma_map(uint64_t __rte_unused vaddr, __rte_unused uint64_t iova, + __rte_unused uint64_t len) +{ + return -1; +} + +int +rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova, + __rte_unused uint64_t len) +{ + return -1; +} + #endif diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.h b/lib/librte_eal/linuxapp/eal/eal_vfio.h index 8059577..b68703e 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.h +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.h @@ -19,6 +19,7 @@ #ifdef VFIO_PRESENT +#include #include #define RTE_VFIO_TYPE1 VFIO_TYPE1_IOMMU @@ -110,6 +111,7 @@ struct vfio_config { int vfio_enabled; int vfio_container_fd; int vfio_active_groups; + const struct vfio_iommu_type *vfio_iommu_type; struct vfio_group vfio_groups[VFIO_MAX_GROUPS]; }; @@ -119,9 +121,18 @@ struct vfio_config { * */ typedef int (*vfio_dma_func_t)(int); +/* Custom memory region DMA mapping function prototype. + * Takes VFIO container fd, virtual address, phisical address, length and + * operation type (0 to unmap 1 for map) as a parameters. + * Returns 0 on success, -1 on error. + **/ +typedef int (*vfio_dma_user_func_t)(int fd, uint64_t vaddr, uint64_t iova, + uint64_t len, int do_map); + struct vfio_iommu_type { int type_id; const char *name; + vfio_dma_user_func_t dma_user_map_func; vfio_dma_func_t dma_map_func; }; -- 2.7.4