From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 0A6461B1A6 for ; Mon, 16 Apr 2018 12:03:07 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Apr 2018 03:03:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,459,1517904000"; d="scan'208";a="48222668" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.128]) ([10.237.220.128]) by orsmga001.jf.intel.com with ESMTP; 16 Apr 2018 03:03:02 -0700 To: Xiao Wang , ferruh.yigit@intel.com Cc: dev@dpdk.org, maxime.coquelin@redhat.com, zhihong.wang@intel.com, tiwei.bie@intel.com, jianfeng.tan@intel.com, cunming.liang@intel.com, dan.daly@intel.com, thomas@monjalon.net, Junjie Chen References: <20180412071956.66178-2-xiao.w.wang@intel.com> <20180415153349.62105-1-xiao.w.wang@intel.com> <20180415153349.62105-3-xiao.w.wang@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Mon, 16 Apr 2018 11:03:01 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180415153349.62105-3-xiao.w.wang@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v7 2/5] vfio: add multi container support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2018 10:03:08 -0000 On 15-Apr-18 4:33 PM, Xiao Wang wrote: > This patch adds APIs to support container create/destroy and device > bind/unbind with a container. It also provides API for IOMMU programing > on a specified container. > > A driver could use "rte_vfio_create_container" helper to create a ^^ wrong API name in commit message :) > new container from eal, use "rte_vfio_bind_group" to bind a device > to the newly created container. During rte_vfio_setup_device the > container bound with the device will be used for IOMMU setup. > > Signed-off-by: Junjie Chen > Signed-off-by: Xiao Wang > Reviewed-by: Maxime Coquelin > Reviewed-by: Ferruh Yigit > --- > lib/librte_eal/bsdapp/eal/eal.c | 52 +++++ > lib/librte_eal/common/include/rte_vfio.h | 119 ++++++++++++ > lib/librte_eal/linuxapp/eal/eal_vfio.c | 316 +++++++++++++++++++++++++++++++ > lib/librte_eal/rte_eal_version.map | 6 + > 4 files changed, 493 insertions(+) > > diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c > index 727adc5d2..c5106d0d6 100644 > --- a/lib/librte_eal/bsdapp/eal/eal.c > +++ b/lib/librte_eal/bsdapp/eal/eal.c > @@ -769,6 +769,14 @@ int rte_vfio_noiommu_is_enabled(void); > int rte_vfio_clear_group(int vfio_group_fd); > int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len); > int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len); > +int rte_vfio_container_create(void); > +int rte_vfio_container_destroy(int container_fd); > +int rte_vfio_bind_group(int container_fd, int iommu_group_no); > +int rte_vfio_unbind_group(int container_fd, int iommu_group_no); Maybe have these under "container" too? e.g. rte_vfio_container_group_bind/unbind? Seems like it would be more consistent that way - anything to do with custom containers would be under rte_vfio_container_* namespace. > +int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, > + uint64_t iova, uint64_t len); > +int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, > + uint64_t iova, uint64_t len); > > int rte_vfio_setup_device(__rte_unused const char *sysfs_base, > __rte_unused const char *dev_addr, > @@ -818,3 +826,47 @@ rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova, > { > return -1; > } > + <...> > diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h > index d26ab01cb..0c1509b29 100644 > --- a/lib/librte_eal/common/include/rte_vfio.h > +++ b/lib/librte_eal/common/include/rte_vfio.h > @@ -168,6 +168,125 @@ rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len); > int __rte_experimental > rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len); > > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice > + * > + * Create a new container for device binding. I would add a note that any newly allocated DPDK memory will not be mapped into these containers by default. > + * > + * @return > + * the container fd if successful > + * <0 if failed > + */ > +int __rte_experimental > +rte_vfio_container_create(void); > + <...> > + * 0 if successful > + * <0 if failed > + */ > +int __rte_experimental > +rte_vfio_unbind_group(int container_fd, int iommu_group_no); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice > + * > + * Perform dma mapping for devices in a conainer. Here and in other places: "dma" should be DMA, and typo: "conainer" :) I think you should also add a note to the original API (not this one, but the old one) that DMA maps done via that API will only apply to default container and will not apply to any of the containers created via container_create(). IOW, documentation should make it clear that if you use this functionality, you're on your own and you have to manage your own DMA mappings for any containers you create. > + * > + * @param container_fd > + * the specified container fd > + * > + * @param vaddr > + * Starting virtual address of memory to be mapped. > + * <...> > + > +int __rte_experimental > +rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova, > + uint64_t len) > +{ > + struct user_mem_map *new_map; > + struct vfio_config *vfio_cfg; > + struct user_mem_maps *user_mem_maps; > + int ret = 0; > + > + if (len == 0) { > + rte_errno = EINVAL; > + return -1; > + } > + > + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); > + if (vfio_cfg == NULL) { > + RTE_LOG(ERR, EAL, "Invalid container fd\n"); > + return -1; > + } > + > + user_mem_maps = &vfio_cfg->mem_maps; > + rte_spinlock_recursive_lock(&user_mem_maps->lock); > + if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) { > + RTE_LOG(ERR, EAL, "No more space for user mem maps\n"); > + rte_errno = ENOMEM; > + ret = -1; > + goto out; > + } > + /* map the entry */ > + if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { > + /* technically, this will fail if there are currently no devices > + * plugged in, even if a device were added later, this mapping > + * might have succeeded. however, since we cannot verify if this > + * is a valid mapping without having a device attached, consider > + * this to be unsupported, because we can't just store any old > + * mapping and pollute list of active mappings willy-nilly. > + */ > + RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n"); > + ret = -1; > + goto out; > + } > + /* create new user mem map entry */ > + new_map = &user_mem_maps->maps[user_mem_maps->n_maps++]; > + new_map->addr = vaddr; > + new_map->iova = iova; > + new_map->len = len; > + > + compact_user_maps(user_mem_maps); > +out: > + rte_spinlock_recursive_unlock(&user_mem_maps->lock); > + return ret; Please correct me if i'm wrong, but it looks like you've just duplicated the code for rte_vfio_dma_map() here and made a few small changes. It would be better if you moved most of this into a static function (e.g. static int container_dma_map(vfio_cfg, vaddr, iova, len)) and called it with either default vfio_cfg from rte_vfio_dma_map, or found vfio_cfg from rte_vfio_container_dma_map. Same applies to function below. > +} > + > +int __rte_experimental > +rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova, > + uint64_t len) > +{ > + struct user_mem_map *map, *new_map = NULL; > + struct vfio_config *vfio_cfg; > + struct user_mem_maps *user_mem_maps; > + int ret = 0; > + > + if (len == 0) { > + rte_errno = EINVAL; > + return -1; > + } > + <...> -- Thanks, Anatoly