From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 06B5B1C073 for ; Thu, 12 Apr 2018 16:03:42 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Apr 2018 07:03:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,441,1517904000"; d="scan'208";a="216075349" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.128]) ([10.237.220.128]) by orsmga005.jf.intel.com with ESMTP; 12 Apr 2018 07:03:39 -0700 To: Xiao Wang , ferruh.yigit@intel.com Cc: dev@dpdk.org, maxime.coquelin@redhat.com, zhihong.wang@intel.com, tiwei.bie@intel.com, jianfeng.tan@intel.com, cunming.liang@intel.com, dan.daly@intel.com, thomas@monjalon.net, gaetan.rivet@6wind.com, hemant.agrawal@nxp.com, Junjie Chen References: <20180405180701.16853-4-xiao.w.wang@intel.com> <20180412071956.66178-1-xiao.w.wang@intel.com> <20180412071956.66178-2-xiao.w.wang@intel.com> From: "Burakov, Anatoly" Message-ID: <974c9cd0-87c4-6ab1-0787-9278a7379fda@intel.com> Date: Thu, 12 Apr 2018 15:03:38 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180412071956.66178-2-xiao.w.wang@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v6 1/4] eal/vfio: add multiple container support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Apr 2018 14:03:43 -0000 On 12-Apr-18 8:19 AM, Xiao Wang wrote: > Currently eal vfio framework binds vfio group fd to the default > container fd during rte_vfio_setup_device, while in some cases, > e.g. vDPA (vhost data path acceleration), we want to put vfio group > to a separate container and program IOMMU via this container. > > This patch adds some APIs to support container creating and device > binding with a container. > > A driver could use "rte_vfio_create_container" helper to create a > new container from eal, use "rte_vfio_bind_group" to bind a device > to the newly created container. > > During rte_vfio_setup_device, the container bound with the device > will be used for IOMMU setup. > > Signed-off-by: Junjie Chen > Signed-off-by: Xiao Wang > Reviewed-by: Maxime Coquelin > Reviewed-by: Ferruh Yigit > --- Apologies for late review. Some comments below. <...> > > +struct rte_memseg; > + > /** > * Setup vfio_cfg for the device identified by its address. > * It discovers the configured I/O MMU groups or sets a new one for the device. > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd); > } > #endif > <...> > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice > + * > + * Perform dma mapping for devices in a conainer. > + * > + * @param container_fd > + * the specified container fd > + * > + * @param dma_type > + * the dma map type > + * > + * @param ms > + * the dma address region to map > + * > + * @return > + * 0 if successful > + * <0 if failed > + */ > +int __rte_experimental > +rte_vfio_dma_map(int container_fd, int dma_type, const struct rte_memseg *ms); > + First of all, why memseg, instead of va/iova/len? This seems like unnecessary attachment to internals of DPDK memory representation. Not all memory comes in memsegs, this makes the API unnecessarily specific to DPDK memory. Also, why providing DMA type? There's already a VFIO type pointer in vfio_config - you can set this pointer for every new created container, so the user wouldn't have to care about IOMMU type. Is it not possible to figure out DMA type from within EAL VFIO? If not, maybe provide an API to do so, e.g. rte_vfio_container_set_dma_type()? This will also need to be rebased on top of latest HEAD because there already is a similar DMA map/unmap API added, only without the container parameter. Perhaps rename these new functions to rte_vfio_container_(create|destroy|dma_map|dma_unmap)? > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice > + * > + * Perform dma unmapping for devices in a conainer. > + * > + * @param container_fd > + * the specified container fd > + * > + * @param dma_type > + * the dma map type > + * > + * @param ms > + * the dma address region to unmap > + * > + * @return > + * 0 if successful > + * <0 if failed > + */ > +int __rte_experimental > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct rte_memseg *ms); > + > #endif /* VFIO_PRESENT */ > <...> > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no) > if (vfio_group_fd < 0) { > /* if file not found, it's not an error */ > if (errno != ENOENT) { > - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename, > - strerror(errno)); > + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", > + filename, strerror(errno)); This looks like unintended change. > return -1; > } > > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no) > vfio_group_fd = open(filename, O_RDWR); > if (vfio_group_fd < 0) { > if (errno != ENOENT) { > - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n", filename, > - strerror(errno)); > + RTE_LOG(ERR, EAL, > + "Cannot open %s: %s\n", > + filename, > + strerror(errno)); This looks like unintended change. > return -1; > } > return 0; > @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no) > /* noiommu group found */ > } > > - cur_grp->group_no = iommu_group_no; > - cur_grp->fd = vfio_group_fd; > - vfio_cfg.vfio_active_groups++; > return vfio_group_fd; > } > - /* if we're in a secondary process, request group fd from the primary > + /* > + * if we're in a secondary process, request group fd from the primary > * process via our socket > */ This looks like unintended change. > else { > - int socket_fd, ret; > - > - socket_fd = vfio_mp_sync_connect_to_primary(); > + int ret; > + int socket_fd = vfio_mp_sync_connect_to_primary(); > > if (socket_fd < 0) { > - RTE_LOG(ERR, EAL, " cannot connect to primary process!\n"); > + RTE_LOG(ERR, EAL, > + " cannot connect to primary process!\n"); This looks like unintended change. > return -1; > } > if (vfio_mp_sync_send_request(socket_fd, SOCKET_REQ_GROUP) < 0) { > @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no) > close(socket_fd); > return -1; > } > + > ret = vfio_mp_sync_receive_request(socket_fd); This looks like unintended change. (hint: "git revert -n HEAD && git add -p" is your friend :) ) > switch (ret) { > case SOCKET_NO_FD: > @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no) > /* if we got the fd, store it and return it */ > if (vfio_group_fd > 0) { > close(socket_fd); > - cur_grp->group_no = iommu_group_no; > - cur_grp->fd = vfio_group_fd; > - vfio_cfg.vfio_active_groups++; > return vfio_group_fd; > } > /* fall-through on error */ > @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no) > return -1; <...> > +int __rte_experimental > +rte_vfio_create_container(void) > +{ > + struct vfio_config *vfio_cfg; > + int i; > + > + /* Find an empty slot to store new vfio config */ > + for (i = 1; i < VFIO_MAX_CONTAINERS; i++) { > + if (vfio_cfgs[i] == NULL) > + break; > + } > + > + if (i == VFIO_MAX_CONTAINERS) { > + RTE_LOG(ERR, EAL, "exceed max vfio container limit\n"); > + return -1; > + } > + > + vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config), > + RTE_CACHE_LINE_SIZE); > + if (vfio_cfgs[i] == NULL) > + return -ENOMEM; Is there a specific reason why 1) dynamic allocation is used (as opposed to just storing a static array), and 2) DPDK memory allocation is used? This seems like unnecessary complication. Even if you were to decide to allocate memory instead of having a static array, you'll have to register for rte_eal_cleanup() to delete any allocated containers on DPDK exit. But, as i said, i think it would be better to keep it as static array. > + > + RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i); > + vfio_cfg = vfio_cfgs[i]; > + vfio_cfg->vfio_active_groups = 0; > + vfio_cfg->vfio_container_fd = vfio_get_container_fd(); > + > + if (vfio_cfg->vfio_container_fd < 0) { > + rte_free(vfio_cfgs[i]); > + vfio_cfgs[i] = NULL; > + return -1; > + } > + > + for (i = 0; i < VFIO_MAX_GROUPS; i++) { > + vfio_cfg->vfio_groups[i].group_no = -1; > + vfio_cfg->vfio_groups[i].fd = -1; > + vfio_cfg->vfio_groups[i].devices = 0; > + } <...> > @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base, > } > > static int > -vfio_type1_dma_map(int vfio_container_fd) > +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg *ms) <...> > +static int > +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct rte_memseg *ms) API's such as these two were recently added to DPDK. -- Thanks, Anatoly