DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: "Ding, Xuan" <xuan.ding@intel.com>,
	"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, "Hu, Jiayu" <jiayu.hu@intel.com>,
	"Wang, YuanX" <yuanx.wang@intel.com>,
	"He, Xingguang" <xingguang.he@intel.com>
Subject: RE: [PATCH v3] vhost: fix physical address mapping
Date: Mon, 15 Nov 2021 07:20:50 +0000	[thread overview]
Message-ID: <SN6PR11MB3504FE01BAE914051B6592559C989@SN6PR11MB3504.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20211110060641.7666-1-xuan.ding@intel.com>

Hi Xuan,

> -----Original Message-----
> From: Ding, Xuan <xuan.ding@intel.com>
> Sent: Wednesday, November 10, 2021 2:07 PM
> To: maxime.coquelin@redhat.com; Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; Hu, Jiayu <jiayu.hu@intel.com>; Wang, YuanX
> <yuanx.wang@intel.com>; He, Xingguang <xingguang.he@intel.com>; Ding, Xuan
> <xuan.ding@intel.com>
> Subject: [PATCH v3] vhost: fix physical address mapping
> 
> When choosing IOVA as PA mode, IOVA is likely to be discontinuous,
> which requires page by page mapping for DMA devices. To be consistent,
> this patch implements page by page mapping instead of mapping at the
> region granularity for both IOVA as VA and PA mode.
> 
> Fixes: 7c61fa08b716 ("vhost: enable IOMMU for async vhost")
> 
> Signed-off-by: Xuan Ding <xuan.ding@intel.com>
> ---
> 
> v3:
> * Fix commit title.
> 
> v2:
> * Fix a format issue.
> ---
>  lib/vhost/vhost.h      |   1 +
>  lib/vhost/vhost_user.c | 105 ++++++++++++++++++++---------------------
>  2 files changed, 53 insertions(+), 53 deletions(-)
> 
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 7085e0885c..d246538ca5 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -355,6 +355,7 @@ struct vring_packed_desc_event {
>  struct guest_page {
>  	uint64_t guest_phys_addr;
>  	uint64_t host_phys_addr;
> +	uint64_t host_user_addr;
>  	uint64_t size;
>  };
> 
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index a781346c4d..37cdedda3c 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -144,52 +144,55 @@ get_blk_size(int fd)
>  }
> 
>  static int
> -async_dma_map(struct rte_vhost_mem_region *region, bool do_map)
> +async_dma_map(struct virtio_net *dev, bool do_map)
>  {
> -	uint64_t host_iova;
>  	int ret = 0;
> -
> -	host_iova = rte_mem_virt2iova((void *)(uintptr_t)region->host_user_addr);
> +	uint32_t i;
> +	struct guest_page *page;
>  	if (do_map) {
> -		/* Add mapped region into the default container of DPDK. */
> -		ret = rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
> -						 region->host_user_addr,
> -						 host_iova,
> -						 region->size);
> -		if (ret) {
> -			/*
> -			 * DMA device may bind with kernel driver, in this case,
> -			 * we don't need to program IOMMU manually. However, if no
> -			 * device is bound with vfio/uio in DPDK, and vfio kernel
> -			 * module is loaded, the API will still be called and return
> -			 * with ENODEV/ENOSUP.
> -			 *
> -			 * DPDK vfio only returns ENODEV/ENOSUP in very similar
> -			 * situations(vfio either unsupported, or supported
> -			 * but no devices found). Either way, no mappings could be
> -			 * performed. We treat it as normal case in async path.
> -			 */
> -			if (rte_errno == ENODEV || rte_errno == ENOTSUP)
> +		for (i = 0; i < dev->nr_guest_pages; i++) {
> +			page = &dev->guest_pages[i];
> +			ret =
> rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD,
> +							 page->host_user_addr,
> +							 page->host_phys_addr,
> +							 page->size);
> +			if (ret) {
> +				/*
> +				 * DMA device may bind with kernel driver, in this
> case,
> +				 * we don't need to program IOMMU manually. However,
> if no
> +				 * device is bound with vfio/uio in DPDK, and vfio
> kernel
> +				 * module is loaded, the API will still be called and
> return
> +				 * with ENODEV/ENOSUP.
> +				 *
> +				 * DPDK vfio only returns ENODEV/ENOSUP in very
> similar
> +				 * situations(vfio either unsupported, or supported
> +				 * but no devices found). Either way, no mappings
> could be
> +				 * performed. We treat it as normal case in async path.
> +				 */
> +				if (rte_errno == ENODEV || rte_errno == ENOTSUP)
> +					return 0;

I don't think this logic is good enough to only include the case of kernel driver +
unneeded mapping. Could also be vfio driver + incorrect mapping. It's not good to
assume ENODEV and ENOTSUP only comes from DPDK, it could be from kernel.

> +
> +				VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n");
> +				/* DMA mapping errors won't stop
> VHST_USER_SET_MEM_TABLE. */
>  				return 0;

I understand this function covers many cases and it's difficult to differentiate,
So you don't check the return value but use the log here to inform users.

I suggest to use a WARNING log (since this could fail with kernel driver case but
it's actually correct) and print the errno info for users.

Note: this is only a workaround, not a perfect solution. But since vhost with dmadev
is in progress and most likely vhost lib will aware of dmadev id. The problem could
be solved later (some dmadev api could be used to know VA/PA mode and kernel/user driver?)

> -
> -			VHOST_LOG_CONFIG(ERR, "DMA engine map failed\n");
> -			/* DMA mapping errors won't stop VHST_USER_SET_MEM_TABLE. */
> -			return 0;
> +			}
>  		}
> 
>  	} else {
> -		/* Remove mapped region from the default container of DPDK. */
> -		ret = rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
> -						   region->host_user_addr,
> -						   host_iova,
> -						   region->size);
> -		if (ret) {
> -			/* like DMA map, ignore the kernel driver case when unmap.
> */
> -			if (rte_errno == EINVAL)
> -				return 0;
> +		for (i = 0; i < dev->nr_guest_pages; i++) {
> +			page = &dev->guest_pages[i];
> +			ret =
> rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD,
> +							   page->host_user_addr,
> +							   page->host_phys_addr,
> +							   page->size);
> +			if (ret) {
> +				/* like DMA map, ignore the kernel driver case when
> unmap. */
> +				if (rte_errno == EINVAL)
> +					return 0;
> 
> -			VHOST_LOG_CONFIG(ERR, "DMA engine unmap failed\n");
> -			return ret;
> +				VHOST_LOG_CONFIG(ERR, "DMA engine unmap failed\n");
> +				return ret;

Same here.

And since you don't check return value, you can just don't return anything and
return type can be void

Thanks,
Chenbo


> +			}
>  		}
>  	}
> 
> @@ -205,12 +208,12 @@ free_mem_region(struct virtio_net *dev)
>  	if (!dev || !dev->mem)
>  		return;
> 
> +	if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> +		async_dma_map(dev, false);
> +
>  	for (i = 0; i < dev->mem->nregions; i++) {
>  		reg = &dev->mem->regions[i];
>  		if (reg->host_user_addr) {
> -			if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> -				async_dma_map(reg, false);
> -
>  			munmap(reg->mmap_addr, reg->mmap_size);
>  			close(reg->fd);
>  		}
> @@ -978,7 +981,7 @@ vhost_user_set_vring_base(struct virtio_net **pdev,
> 
>  static int
>  add_one_guest_page(struct virtio_net *dev, uint64_t guest_phys_addr,
> -		   uint64_t host_phys_addr, uint64_t size)
> +		   uint64_t host_phys_addr, uint64_t host_user_addr, uint64_t size)
>  {
>  	struct guest_page *page, *last_page;
>  	struct guest_page *old_pages;
> @@ -1009,6 +1012,7 @@ add_one_guest_page(struct virtio_net *dev, uint64_t
> guest_phys_addr,
>  	page = &dev->guest_pages[dev->nr_guest_pages++];
>  	page->guest_phys_addr = guest_phys_addr;
>  	page->host_phys_addr  = host_phys_addr;
> +	page->host_user_addr = host_user_addr;
>  	page->size = size;
> 
>  	return 0;
> @@ -1028,7 +1032,8 @@ add_guest_pages(struct virtio_net *dev, struct
> rte_vhost_mem_region *reg,
>  	size = page_size - (guest_phys_addr & (page_size - 1));
>  	size = RTE_MIN(size, reg_size);
> 
> -	if (add_one_guest_page(dev, guest_phys_addr, host_phys_addr, size) < 0)
> +	if (add_one_guest_page(dev, guest_phys_addr, host_phys_addr,
> +			       host_user_addr, size) < 0)
>  		return -1;
> 
>  	host_user_addr  += size;
> @@ -1040,7 +1045,7 @@ add_guest_pages(struct virtio_net *dev, struct
> rte_vhost_mem_region *reg,
>  		host_phys_addr = rte_mem_virt2iova((void *)(uintptr_t)
>  						  host_user_addr);
>  		if (add_one_guest_page(dev, guest_phys_addr, host_phys_addr,
> -				size) < 0)
> +				       host_user_addr, size) < 0)
>  			return -1;
> 
>  		host_user_addr  += size;
> @@ -1215,7 +1220,6 @@ vhost_user_mmap_region(struct virtio_net *dev,
>  	uint64_t mmap_size;
>  	uint64_t alignment;
>  	int populate;
> -	int ret;
> 
>  	/* Check for memory_size + mmap_offset overflow */
>  	if (mmap_offset >= -region->size) {
> @@ -1274,14 +1278,6 @@ vhost_user_mmap_region(struct virtio_net *dev,
>  			VHOST_LOG_CONFIG(ERR, "adding guest pages to region
> failed.\n");
>  			return -1;
>  		}
> -
> -		if (rte_vfio_is_enabled("vfio")) {
> -			ret = async_dma_map(region, true);
> -			if (ret) {
> -				VHOST_LOG_CONFIG(ERR, "Configure IOMMU for DMA engine
> failed\n");
> -				return -1;
> -			}
> -		}
>  	}
> 
>  	VHOST_LOG_CONFIG(INFO,
> @@ -1420,6 +1416,9 @@ vhost_user_set_mem_table(struct virtio_net **pdev,
> struct VhostUserMsg *msg,
>  		dev->mem->nregions++;
>  	}
> 
> +	if (dev->async_copy && rte_vfio_is_enabled("vfio"))
> +		async_dma_map(dev, true);
> +
>  	if (vhost_user_postcopy_register(dev, main_fd, msg) < 0)
>  		goto free_mem_table;
> 
> --
> 2.17.1


  reply	other threads:[~2021-11-15  7:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-10  5:46 [dpdk-dev] [PATCH] " Xuan Ding
2021-11-10  5:56 ` Xuan Ding
2021-11-10  6:06 ` [dpdk-dev] [PATCH v3] " Xuan Ding
2021-11-15  7:20   ` Xia, Chenbo [this message]
2021-11-15  8:13     ` Ding, Xuan
2021-11-15 12:11       ` Xia, Chenbo
2021-11-15 12:32 ` [PATCH v4] " Xuan Ding
2021-11-16  7:47   ` Xia, Chenbo
2021-11-16  8:24     ` Ding, Xuan
2021-11-17 14:39       ` Ding, Xuan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR11MB3504FE01BAE914051B6592559C989@SN6PR11MB3504.namprd11.prod.outlook.com \
    --to=chenbo.xia@intel.com \
    --cc=dev@dpdk.org \
    --cc=jiayu.hu@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=xingguang.he@intel.com \
    --cc=xuan.ding@intel.com \
    --cc=yuanx.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).